Text Generation Inference v2.0.0 — ready server for LLM inference, written in Rust, Python and gRPC.
Inference of an ML model is the process of its operation on the final device. Therefore, the more we accelerate the inference, the faster the model works.
The new version of Text Generation Inference adds support for the Command R+ model.
TGI is the fastest open source server for Command R+
Using the power of Medusa heads, tests achieve unprecedented speed with a latency of only 9 ms per token for the 104B model!
Supports popular open source Lms Llama, Falcon, StarCoder, BLOOM, GPT-NeoX and others.
• Github
• Installation
🆔 @Ai_Tv
>>Click here to continue<<