docs(models-http-api): add llamafile example

This commit is contained in:
Wei Zhang 2024-11-11 23:00:26 +08:00
parent bf031513d5
commit 502bb410f6

View File

@ -0,0 +1,35 @@
# llamafile
[llamafile](https://github.com/Mozilla-Ocho/llamafile)
is a Mozilla Builders project that allows you to distribute and run LLMs with a single file.
llamafile provides an OpenAI API-compatible chat-completions and embedding endpoint,
enabling us to use the OpenAI kinds for chat and embeddings.
However, for completion, there are certain differences in the implementation, and we are still working on it.
llamafile uses port `8080` by default, which is also the port used by Tabby.
Therefore, it is recommended to run llamafile with the `--port` option to serve on a different port, such as `8081`.
Below is an example for chat:
```toml title="~/.tabby/config.toml"
# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "your_model"
api_endpoint = "http://localhost:8081/v1"
api_key = ""
```
For embeddings, the embedding endpoint is no longer supported in the standard llamafile server,
so you have to run llamafile with the `--embedding` option and set the Tabby config to:
```toml title="~/.tabby/config.toml"
# Embedding model
[model.embedding.http]
kind = "openai/embedding"
model_name = "your_model"
api_endpoint = "http://localhost:8082/v1"
api_key = ""
```