mirror of
https://github.com/TabbyML/tabby
synced 2024-11-22 00:08:06 +00:00
b3019aca2c
* Update faq.mdx Local model.json path description added * Update faq.mdx added information about MODEL_SPEC.md * Update website/docs/faq.mdx * Update faq.mdx --------- Co-authored-by: Meng Zhang <meng@tabbyml.com>
52 lines
2.3 KiB
Plaintext
Vendored
52 lines
2.3 KiB
Plaintext
Vendored
import Collapse from '@site/src/components/Collapse';
|
|
|
|
# ⁉️ Frequently Asked Questions
|
|
|
|
<Collapse title="How much VRAM a LLM model consumes?">
|
|
|
|
By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.
|
|
|
|
For ROCm the actual limits are currently largely untested, but the same CodeLlama-7B seems to use 8GB of VRAM as well on a AMD Radeon™ RX 7900 XTX according to the ROCm monitoring tools.
|
|
|
|
</Collapse>
|
|
|
|
<Collapse title="What GPUs are required for reduced-precision inference (e.g int8)?">
|
|
|
|
* int8: Compute Capability >= 7.0 or Compute Capability 6.1
|
|
* float16: Compute Capability >= 7.0
|
|
* bfloat16: Compute Capability >= 8.0
|
|
|
|
To determine the mapping between the GPU card type and its compute capability, please visit [this page](https://developer.nvidia.com/cuda-gpus)
|
|
|
|
</Collapse>
|
|
|
|
<Collapse title="How to utilize multiple NVIDIA GPUs?">
|
|
|
|
Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.
|
|
|
|
</Collapse>
|
|
|
|
<Collapse title="My AMD device isn't supported by ROCm">
|
|
|
|
You can use the HSA_OVERRIDE_GFX_VERSION variable if there is a similar GPU that is supported by ROCm you can set it to that.
|
|
|
|
For example for RDNA2 you can set it to 10.3.0 and to 11.0.0 for RDNA3.
|
|
|
|
</Collapse>
|
|
|
|
<Collapse title="How can I convert my own model for use with Tabby?">
|
|
|
|
Since version 0.5.0, Tabby's inference now operates entirely on llama.cpp, allowing the use of any GGUF-compatible model format with Tabby. To enhance accessibility, we have curated models that we benchmarked, available at [registry-tabby](https://github.com/TabbyML/registry-tabby)
|
|
|
|
Users are free to fork the repository to create their own registry. If a user's registry is located at `https://github.com/USERNAME/registry-tabby`, the model ID will be `USERNAME/model`.
|
|
|
|
For details on the registry format, please refer to [models.json](https://github.com/TabbyML/registry-tabby/blob/main/models.json)
|
|
|
|
</Collapse>
|
|
|
|
<Collapse title="Can I use local model with Tabby?">
|
|
|
|
Tabby also supports loading models from a local directory that follow our specifications as outlined in [MODEL_SPEC.md](https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md).
|
|
|
|
</Collapse>
|