oneuptime/Llama
Simon Larsen c099f3a3ef refactor: Improve error handling in app.py
The error handling in app.py has been improved to catch and handle exceptions that occur during the processing of items in the queue. This change ensures that errors are properly logged and that the affected items are removed from the pending items list.
2024-06-23 20:31:43 +00:00
..
Models refactor: Update Dockerfile.tpl to use huggingface/transformers-pytorch-gpu image 2024-06-19 13:20:34 +00:00
app.py refactor: Improve error handling in app.py 2024-06-23 20:31:43 +00:00
Dockerfile.tpl refactor: Update Dockerfile.tpl to use huggingface/transformers-pytorch-gpu image 2024-06-19 13:06:23 +00:00
Readme.md enable gpu on llama docker 2023-10-18 12:07:37 +01:00
requirements.txt refactor: Sanitize file path in CodeRepositoryUtil.getFileContent() 2024-06-20 10:26:16 +01:00
tsconfig.json Update tsconfig.json files with resolveJsonModule option 2024-04-08 14:03:07 +01:00

Llama

Prepare

  • Download models from meta
  • Once the model is downloaded, place them in the Llama/Models folder. Please make sure you also place tokenizer.model and tokenizer_checklist.chk in the same folder.
  • Edit Dockerfile to include the model name in the MODEL_NAME variable.
  • Docker build
docker build -t llama . -f ./Llama/Dockerfile 

Run

For Linux

docker run --gpus all -p 8547:8547 -it -v ./Llama/Models:/app/Models llama 

For MacOS

docker run -p 8547:8547 -it -v ./Llama/Models:/app/Models llama 

Run without a docker conatiner

uvicorn app:app --host 0.0.0.0 --port 8547