From 502bb410f6ad46f2ed03c2a48586d4aeeeaca3ee Mon Sep 17 00:00:00 2001
From: Wei Zhang <kweizh@tabbyml.com>
Date: Mon, 11 Nov 2024 23:00:26 +0800
Subject: [PATCH] docs(models-http-api): add llamafile example

---
 .../references/models-http-api/llamafile.md   | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 website/docs/references/models-http-api/llamafile.md

diff --git a/website/docs/references/models-http-api/llamafile.md b/website/docs/references/models-http-api/llamafile.md
new file mode 100644
index 000000000..1333eaf15
--- /dev/null
+++ b/website/docs/references/models-http-api/llamafile.md
@@ -0,0 +1,35 @@
+# llamafile
+
+[llamafile](https://github.com/Mozilla-Ocho/llamafile)
+is a Mozilla Builders project that allows you to distribute and run LLMs with a single file.
+
+llamafile provides an OpenAI API-compatible chat-completions and embedding endpoint,
+enabling us to use the OpenAI kinds for chat and embeddings.
+
+However, for completion, there are certain differences in the implementation, and we are still working on it.
+
+llamafile uses port `8080` by default, which is also the port used by Tabby.
+Therefore, it is recommended to run llamafile with the `--port` option to serve on a different port, such as `8081`.
+
+Below is an example for chat:
+
+```toml title="~/.tabby/config.toml"
+# Chat model
+[model.chat.http]
+kind = "openai/chat"
+model_name = "your_model"
+api_endpoint = "http://localhost:8081/v1"
+api_key = ""
+```
+
+For embeddings, the embedding endpoint is no longer supported in the standard llamafile server,
+so you have to run llamafile with the `--embedding` option and set the Tabby config to:
+
+```toml title="~/.tabby/config.toml"
+# Embedding model
+[model.embedding.http]
+kind = "openai/embedding"
+model_name = "your_model"
+api_endpoint = "http://localhost:8082/v1"
+api_key = ""
+```
\ No newline at end of file