## Get started ### Docker The easiest way of getting started is using the official docker image: ```bash # Create data dir and grant owner to 1000 (Tabby run as uid 1000 in container) mkdir -p data/hf_cache && chown -R 1000 data docker run \ -it --rm \ -v ./data:/data \ -v ./data/hf_cache:/home/app/.cache/huggingface \ -p 5000:5000 \ -e MODEL_NAME=TabbyML/J-350M \ tabbyml/tabby ``` To use the GPU backend (triton) for a faster inference speed: ```bash docker run \ --gpus all \ -it --rm \ -v ./data:/data \ -v ./data/hf_cache:/home/app/.cache/huggingface \ -p 5000:5000 \ -e MODEL_NAME=TabbyML/J-350M \ -e MODEL_BACKEND=triton \ tabbyml/tabby ``` Note: To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher. You can then query the server using `/v1/completions` endpoint: ```bash curl -X POST http://localhost:5000/v1/completions -H 'Content-Type: application/json' --data '{ "prompt": "def binarySearch(arr, left, right, x):\n mid = (left +" }' ``` We also provides an interactive playground in admin panel [localhost:5000/_admin](http://localhost:5000/_admin) ![image](https://user-images.githubusercontent.com/388154/227792390-ec19e9b9-ebbb-4a94-99ca-8a142ffb5e46.png) ### Skypilot See [deployment/skypilot/README.md](./deployment/skypilot/README.md) ## API documentation Tabby opens an FastAPI server at [localhost:5000](https://localhost:5000), which embeds an OpenAPI documentation of the HTTP API. ## Development Go to `development` directory. ```bash make dev ``` or ```bash make dev-triton # Turn on triton backend (for cuda env developers) ```