oneuptime/Llama
2024-06-27 17:48:41 +00:00
..
Models chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00
app.py add queue to process responses 2024-06-27 14:56:22 +00:00
Dockerfile.tpl refactor: Update Dockerfile.tpl to use huggingface/transformers-pytorch-gpu image 2024-06-19 13:06:23 +00:00
Readme.md chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00
requirements.txt refactor: Sanitize file path in CodeRepositoryUtil.getFileContent() 2024-06-20 10:26:16 +01:00
tsconfig.json Update tsconfig.json files with resolveJsonModule option 2024-04-08 14:03:07 +01:00

Llama

Development Guide

Step 1: Downloading Model from Hugging Face

Please make sure you have git lfs installed before cloning the model.

git lfs install
cd ./Llama/Models
# Here we are downloading the Meta-Llama-3-8B-Instruct model
git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

You will be asked for username and password. Please use Hugging Face Username as Username and, Hugging Face API Token as Password.

Step 2: Install Docker.

Install Docker and Docker Compose

sudo apt-get update
sudo curl -sSL https://get.docker.com/ | sh  

Install Rootless Docker

sudo apt-get install -y uidmap
dockerd-rootless-setuptool.sh install

See if the installation works

docker --version
docker ps 

# You should see no containers running, but you should not see any errors. 

Step 3: Insall nvidia drivers on the machine to use GPU

Step 4: Run the test workload to see if GPU is connected to Docker.

docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

You have configured the machine to use GPU with Docker.

Build

  • Download models from meta
  • Once the model is downloaded, place them in the Llama/Models folder. Please make sure you also place tokenizer.model and tokenizer_checklist.chk in the same folder.
  • Edit Dockerfile to include the model name in the MODEL_NAME variable.
  • Docker build
npm run build-ai

Run

npm run start-ai