oneuptime/Llama
Simon Larsen 9b08d1a9e4 refactor: Convert job function to async in app.py
The job function in app.py has been converted to an async function to support asynchronous processing. This change improves the performance and responsiveness of the application by allowing other tasks to run concurrently while the job function is processing the queue.
2024-06-19 21:05:36 +00:00
..
Models refactor: Update Dockerfile.tpl to use huggingface/transformers-pytorch-gpu image 2024-06-19 13:20:34 +00:00
app.py refactor: Convert job function to async in app.py 2024-06-19 21:05:36 +00:00
Dockerfile.tpl refactor: Update Dockerfile.tpl to use huggingface/transformers-pytorch-gpu image 2024-06-19 13:06:23 +00:00
Readme.md
requirements.txt refactor: Add GPU support to Llama app in docker-compose.ai.yml 2024-06-19 20:58:08 +00:00
tsconfig.json

Llama

Prepare

  • Download models from meta
  • Once the model is downloaded, place them in the Llama/Models folder. Please make sure you also place tokenizer.model and tokenizer_checklist.chk in the same folder.
  • Edit Dockerfile to include the model name in the MODEL_NAME variable.
  • Docker build
docker build -t llama . -f ./Llama/Dockerfile 

Run

For Linux

docker run --gpus all -p 8547:8547 -it -v ./Llama/Models:/app/Models llama 

For MacOS

docker run -p 8547:8547 -it -v ./Llama/Models:/app/Models llama 

Run without a docker conatiner

uvicorn app:app --host 0.0.0.0 --port 8547