* For Answer Engine, when the file content is reasonably short (e.g., less than 200 lines of code), include the entire file content directly instead of only the chunk ([#3096](https://github.com/TabbyML/tabby/issues/3096)).
* Allowed adding additional languages through the `config.toml` file.
* Allowed customizing the `system_prompt` for Answer Engine.
### Fixes and Improvements
* Redesigned homepage to make team activities (e.g., threads discussed in Answer Engine) discoverable.
* Supported downloading models with multiple partitions (e.g., Qwen-2.5 series).
* The Chat Side Panel implementation has been redesigned in version 0.18, necessitating an extension version bump for compatibility with 0.18.0.
- VSCode: >= 1.12.0
- IntelliJ: >= 1.8.0
### Features
* User Groups Access Control: Server Administrators can now assign user groups to specific context providers to precisely control which contexts can be accessed by which user groups.
* We've reworked the `Web` (a beta feature) context provider into the `Developer Docs` context provider. Previously added context in the `Web` tab has been cleared and needs to be manually migrated to `Developer Docs`.
### Features
* Extensive rework has been done in the answer engine search box.
- Developer Docs / Web search is now triggered by `@`.
* Starting from this version, we are utilizing websockets for features that require streaming (e.g., Answer Engine and Chat Side Panel). If you are deploying tabby behind a reverse proxy, you may need to configure the proxy to support websockets.
* Discussion threads in the Answer Engine are now persisted, allowing users to share threads with others.
### Fixed and Improvements
* Fixed an issue where the llama-server subprocess was not being reused when reusing a model for Chat / Completion together (e.g., Codestral-22B) with the local model backend.
* Updated llama.cpp to version b3571 to support the jina series embedding models.
* Code search functionality is now available in the `Code Browser` tab. Users can search for code using regex patterns and filter by language, repository, and branch.
* Initial experimental support for natural language to codebase conversation in `Answer Engine`.
### Fixed and Improvements
* Incremental issues / PRs indexing by checking `updated_at`.
* Canonicalize `git_url` before performing a relevant code search. Previously, for git_urls with credentials, the canonicalized git_url was used in the index, but the query still used the raw git_url.
* bump llama.cpp to b3370 - which fixes Qwen2 model series inference
* Introduced a new Home page featuring the Answer Engine, which activates when the chat model is loaded.
* Enhanced the Answer Engine's context by indexing issues and pull requests.
* Supports web page crawling to further enrich the Answer Engine's context.
* Enabled navigation through various git trees in the git browser.
### Fixed and Improvements
* Turn on sha256 checksum verification for model downloading.
* Added an environment variable `TABBY_HUGGINGFACE_HOST_OVERRIDE` to override `huggingface.co` with compatible mirrors (e.g., `hf-mirror.com`) for model downloading.
* Bumped `llama.cpp` version to [b3166](https://github.com/ggerganov/llama.cpp/releases/tag/3166).
* Improved logging for the `llama.cpp` backend.
* Added support for triggering background jobs in the admin UI.
* Enhanced logging for backend jobs in the admin UI.
* The `--webserver` flag is now enabled by default in `tabby serve`. To turn off the webserver and only use OSS features, use the `--no-webserver` flag.
* The `/v1beta/chat/completions` endpoint has been moved to `/v1/chat/completions`, while the old endpoint is still available for backward compatibility.
* Models from our official registry can now be referred to without the TabbyML prefix. Therefore, for the model TabbyML/CodeLlama-7B, you can simply refer to it as CodeLlama-7B everywhere.
* Tabby now includes built-in user management and secure access, ensuring that it is only accessible to your team.
* The `--webserver` flag is a new addition to `tabby serve` that enables secure access to the tabby server. When this flag is on, IDE extensions will need to provide an authorization token to access the instance.
- Some functionalities that are bound to the webserver (e.g. playground) will also require the `--webserver` flag.
* llama.cpp backend (CPU, Metal) now requires a redownload of gguf model due to upstream format changes: https://github.com/TabbyML/tabby/pull/645 https://github.com/ggerganov/llama.cpp/pull/3252
* Cuda backend is switched to llama.cpp: https://github.com/TabbyML/tabby/pull/656
* Tokenizer implementation is switched to llama.cpp, so tabby no longer need to download additional tokenizer file: https://github.com/TabbyML/tabby/pull/683
We have introduced a new argument, `--chat-model`, which allows you to specify the model for the chat playground located at http://localhost:8080/playground
To utilize this feature, use the following command in the terminal:
```bash
tabby serve --device metal --model TabbyML/StarCoder-1B --chat-model TabbyML/Mistral-7B
Mainland Chinese users have been facing challenges accessing Hugging Face due to various reasons. The Tabby team is actively working to address this issue by mirroring models to a hosting provider in mainland China called modelscope.cn.
* Implemented more accurate UTF-8 incremental decoding in the [GitHub pull request](https://github.com/TabbyML/tabby/pull/491).
* Fixed the stop words implementation by utilizing RegexSet to isolate the stop word group.
* Improved model downloading logic; now Tabby will attempt to fetch the latest model version if there's a remote change, and the local cache key becomes stale.