tabby/MODEL_SPEC.md
boxbeam b05d0e71ee
feat(common): Migrate model path to "model.gguf" instead of "q8_0.v2.gguf" (#1847)
* feat(common): Migrate model path

* Add unit test

* [autofix.ci] apply automated fixes

* Don't specify tokio features in workspace

* Update Cargo.toml

* Apply suggestions

* Update CHANGELOG.md

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Meng Zhang <meng@tabbyml.com>
2024-04-17 19:38:38 +00:00

1.6 KiB

Tabby Model Specification (Unstable)

Tabby organizes the model within a directory. This document provides an explanation of the necessary contents for supporting model serving. The minimal Tabby model directory should include the following contents:

ggml/
tabby.json

tabby.json

This file provides meta information about the model. An example file appears as follows:

{
    "prompt_template": "<PRE>{prefix}<SUF>{suffix}<MID>",
    "chat_template":  "<s>{% for message in messages %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + '</s> ' }}{% endif %}{% endfor %}",
}

The prompt_template field is optional. When present, it is assumed that the model supports FIM inference.

One example for the prompt_template is <PRE>{prefix}<SUF>{suffix}<MID>. In this format, {prefix} and {suffix} will be replaced with their corresponding values, and the entire prompt will be fed into the LLM.

The chat_template field is optional. When it is present, it is assumed that the model supports an instruct/chat-style interaction, and can be passed to --chat-model.

ggml/

This directory contains binary files used by the llama.cpp inference engine. Tabby utilizes ggml for inference on cpu, cuda and metal devices.

Currently, only q8_0.v2.gguf (or, starting with 0.11, model.gguf) in this directory is in use. You can refer to the instructions in llama.cpp to learn how to acquire it.