llameye

LLamEYE is specifically tailored for AI application developers who are tirelessly building production-ready applications using LLMs. It brings forth an extensive array of tools and functionalities to seamlessly fine-tune, serve, deploy, and monitor these models, streamlining the end-to-end deployment workflow for LLMs.

Features that stand out 

Serve LLMs over a RESTful API or gRPC with a single command. You can interact with the model using a Web UI, CLI, Python/JavaScript client, or any HTTP client of your choice. First-class support for LangChain, BentoML, and Hugging Face Agents E.g., tie a remote self-hosted OpenLLM into your langchain app Token streaming support Embedding endpoint support Quantization support You can fuse model-compatible existing pre-trained QLoRAa/LoRA adapters with the chosen LLM with the addition of a flag to the serve command, still experimental though: