oneuptime/Llama/Readme.md

# Llama 

### Development Guide

#### Step 1: Downloading Model from Hugging Face 

Please make sure you have git lfs installed before cloning the model. 

```bash
git lfs install
```

```bash
cd ./Llama/Models
# Here we are downloading the Meta-Llama-3-8B-Instruct model
git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
```

You will be asked for username and password. 
Please use Hugging Face Username as Username and, 
Hugging Face API Token as Password. 

#### Step 2: Install Docker. 

Install Docker and Docker Compose 

```bash
sudo apt-get update
sudo curl -sSL https://get.docker.com/ | sh  
```

Install Rootless Docker

```bash
sudo apt-get install -y uidmap
dockerd-rootless-setuptool.sh install
```

See if the installation works

```bash
docker --version
docker ps 

# You should see no containers running, but you should not see any errors. 
```

#### Step 3: Insall nvidia drivers on the machine to use GPU

- Install Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit
- Install CUDA: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network
- Restart the machine
- You should now see GPU when you run `nvidia-smi`

#### Step 4: Run the test workload to see if GPU is connected to Docker. 

```bash
docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
```

You have configured the machine to use GPU with Docker.


### Build 

- Download models from meta
- Once the model is downloaded, place them in the `Llama/Models` folder. Please make sure you also place tokenizer.model and tokenizer_checklist.chk in the same folder.
- Edit `Dockerfile` to include the model name in the `MODEL_NAME` variable.
- Docker build 

```
npm run build-ai
```

### Run

```
npm run start-ai    
```
male llama work 2023-10-14 15:36:12 +00:00			`# Llama`

chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`### Development Guide`
male llama work 2023-10-14 15:36:12 +00:00
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`#### Step 1: Downloading Model from Hugging Face`
male llama work 2023-10-14 15:36:12 +00:00
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`Please make sure you have git lfs installed before cloning the model.`

			```bash
			`git lfs install`
male llama work 2023-10-14 15:36:12 +00:00			```
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00
			```bash
			`cd ./Llama/Models`
			`# Here we are downloading the Meta-Llama-3-8B-Instruct model`
			`git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct`
male llama work 2023-10-14 15:36:12 +00:00			```

chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`You will be asked for username and password.`
			`Please use Hugging Face Username as Username and,`
			`Hugging Face API Token as Password.`

			`#### Step 2: Install Docker.`
male llama work 2023-10-14 15:36:12 +00:00
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`Install Docker and Docker Compose`
enable gpu on llama docker 2023-10-18 11:07:37 +00:00
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			```bash
			`sudo apt-get update`
			`sudo curl -sSL https://get.docker.com/ \| sh`
enable gpu on llama docker 2023-10-18 11:07:37 +00:00			```
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00
			`Install Rootless Docker`

			```bash
			`sudo apt-get install -y uidmap`
			`dockerd-rootless-setuptool.sh install`
enable gpu on llama docker 2023-10-18 11:07:37 +00:00			```

chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`See if the installation works`

			```bash
			`docker --version`
			`docker ps`
enable gpu on llama docker 2023-10-18 11:07:37 +00:00
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`# You should see no containers running, but you should not see any errors.`
male llama work 2023-10-14 15:36:12 +00:00			```
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00
			`#### Step 3: Insall nvidia drivers on the machine to use GPU`

			`- Install Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit`
			`- Install CUDA: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network`
			`- Restart the machine`
			- You should now see GPU when you run `nvidia-smi`

			`#### Step 4: Run the test workload to see if GPU is connected to Docker.`

			```bash
			`docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark`
make llama work with rest api 2023-10-16 10:45:15 +00:00			```

chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`You have configured the machine to use GPU with Docker.`


			`### Build`

			`- Download models from meta`
			- Once the model is downloaded, place them in the `Llama/Models` folder. Please make sure you also place tokenizer.model and tokenizer_checklist.chk in the same folder.
			- Edit `Dockerfile` to include the model name in the `MODEL_NAME` variable.
			`- Docker build`

			```
			`npm run build-ai`
			```
make llama work with rest api 2023-10-16 10:45:15 +00:00
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			`### Run`
make llama work with rest api 2023-10-16 10:45:15 +00:00
chore: Update docker-compose command in test.e2e.yaml 2024-06-27 17:48:41 +00:00			```
			`npm run start-ai`
			```