Skip to content
This repository was archived by the owner on Sep 18, 2025. It is now read-only.

Commit a7916e5

Browse files
authored
Updated to 1.21.1, Multiple fixes and linted with ruff (#140)
1 parent f6169e5 commit a7916e5

14 files changed

Lines changed: 402 additions & 379 deletions

PyTorch/vLLM_Tutorials/Deploying_vLLM/Dockerfile-1.21.0-ub22-vllm-v0.7.2+Gaudi

Lines changed: 0 additions & 39 deletions
This file was deleted.

PyTorch/vLLM_Tutorials/Deploying_vLLM/Dockerfile-1.21.0-ub24-vllm-v0.7.2+Gaudi

Lines changed: 0 additions & 39 deletions
This file was deleted.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Parameterize base image components
2+
ARG DOCKER_URL=vault.habana.ai/gaudi-docker
3+
ARG VERSION=1.21.1
4+
ARG BASE_NAME=ubuntu22.04
5+
ARG PT_VERSION=2.6.0
6+
ARG REVISION=latest
7+
ARG REPO_TYPE=habanalabs
8+
# Parameterize commit/branch for vllm-fork checkout
9+
ARG VLLM_FORK_COMMIT=v0.7.2+Gaudi-1.21.0
10+
11+
FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-installer-${PT_VERSION}:${REVISION}
12+
13+
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
14+
15+
RUN apt update && \
16+
apt install -y gettext moreutils jq && \
17+
ln -sf /usr/bin/python3 /usr/bin/python
18+
WORKDIR /root
19+
20+
# Install vllm-fork inside the container
21+
ENV VLLM_TARGET_DEVICE=hpu
22+
RUN git clone https://github.com/HabanaAI/vllm-fork.git && \
23+
cd vllm-fork && \
24+
git checkout ${VLLM_FORK_COMMIT} && \
25+
pip install -v -e .
26+
27+
# Install additional Python packages
28+
RUN pip install datasets && \
29+
pip install pandas
30+
31+
# Copy utility scripts and configuration
32+
RUN mkdir -p /root/scripts
33+
COPY entrypoint.sh vllm_autocalc.py settings_vllm.csv template_vllm_server.sh varlist* perftest* /root/scripts
34+
RUN chmod +x /root/scripts/*.sh
35+
WORKDIR /root/scripts
36+
37+
# Set entrypoint script
38+
ENTRYPOINT ["/root/scripts/entrypoint.sh"]
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Parameterize base image components
2+
ARG DOCKER_URL=vault.habana.ai/gaudi-docker
3+
ARG VERSION=1.21.1
4+
ARG BASE_NAME=ubuntu24.04
5+
ARG PT_VERSION=2.6.0
6+
ARG REVISION=latest
7+
ARG REPO_TYPE=habanalabs
8+
# Parameterize commit/branch for vllm-fork checkout
9+
ARG VLLM_FORK_COMMIT=v0.7.2+Gaudi-1.21.0
10+
11+
FROM ${DOCKER_URL}/${VERSION}/${BASE_NAME}/${REPO_TYPE}/pytorch-installer-${PT_VERSION}:${REVISION}
12+
13+
ENV OMPI_MCA_btl_vader_single_copy_mechanism=none
14+
15+
RUN apt update && \
16+
apt install -y gettext moreutils jq && \
17+
ln -sf /usr/bin/python3 /usr/bin/python
18+
WORKDIR /root
19+
20+
# Install vllm-fork inside the container
21+
ENV VLLM_TARGET_DEVICE=hpu
22+
RUN git clone https://github.com/HabanaAI/vllm-fork.git && \
23+
cd vllm-fork && \
24+
git checkout ${VLLM_FORK_COMMIT} && \
25+
pip install -v -e .
26+
27+
# Install additional Python packages
28+
RUN pip install datasets && \
29+
pip install pandas
30+
31+
# Copy utility scripts and configuration
32+
RUN mkdir -p /root/scripts
33+
COPY entrypoint.sh vllm_autocalc.py settings_vllm.csv template_vllm_server.sh varlist* perftest* /root/scripts
34+
RUN chmod +x /root/scripts/*.sh
35+
WORKDIR /root/scripts
36+
37+
# Set entrypoint script
38+
ENTRYPOINT ["/root/scripts/entrypoint.sh"]

PyTorch/vLLM_Tutorials/Deploying_vLLM/README.md

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ This folder contains scripts and configuration files that can be used to build a
1818
|Qwen/Qwen2.5-32B-Instruct |1|
1919
|Qwen/Qwen2.5-72B-Instruct |4|
2020
|Qwen/Qwen2.5-7B-Instruct |1|
21-
2221
## Quick Start
2322
To run these models on your Gaudi machine:
2423

@@ -27,19 +26,25 @@ To run these models on your Gaudi machine:
2726
git clone https://github.com/HabanaAI/Gaudi-tutorials
2827
cd Gaudi-tutorials/PyTorch/vLLM_Tutorials/Deploying_vLLM
2928
```
29+
30+
> **IMPORTANT**
31+
>
32+
> **All build and run steps listed in this document need to be executed on Gaudi Hardware**
33+
>
34+
3035
2) Depending on the base OS you are running, select the appropriate Dockerfile. The examples in this page are for Ubuntu 24.04
31-
- Ubuntu 22.04: Dockerfile-1.21.0-ub22-vllm-v0.7.2+Gaudi
32-
- Ubuntu 24.04: Dockerfile-1.21.0-ub24-vllm-v0.7.2+Gaudi
36+
- Ubuntu 22.04: Dockerfile-1.21.1-ub22-vllm-v0.7.2+Gaudi
37+
- Ubuntu 24.04: Dockerfile-1.21.1-ub24-vllm-v0.7.2+Gaudi
3338

3439
3) To build the `vllm-v0.7.2-gaudi` image from the Dockerfile, use the command below.
3540
```bash
3641
## Set the next line if you are using a HTTP proxy on your build machine
3742
BUILD_ARGS="--build-arg http_proxy --build-arg https_proxy --build-arg no_proxy"
38-
docker build -f Dockerfile-1.21.0-ub24-vllm-v0.7.2+Gaudi $BUILD_ARGS -t vllm-v0.7.2-gaudi-ub24:1.21.0-555 .
43+
docker build -f Dockerfile-1.21.1-ub24-vllm-v0.7.2+Gaudi $BUILD_ARGS -t vllm-v0.7.2-gaudi-ub24:1.21.1-16 .
3944
```
4045

4146
4) Set the following variables with appropriate values
42-
- -e model= (choose from table above)
47+
- -e MODEL= (choose from table above)
4348
- -e HF_TOKEN= (Generate a token from https://huggingface.co)
4449

4550
> Note:
@@ -58,14 +63,14 @@ docker run -it --rm \
5863
-e HF_TOKEN=YOUR_TOKEN_HERE \
5964
-e HABANA_VISIBLE_DEVICES=all \
6065
-p 8000:8000 \
61-
-e model=meta-llama/Llama-3.1-8B-Instruct \
66+
-e MODEL=meta-llama/Llama-3.1-8B-Instruct \
6267
--name vllm-server \
63-
vllm-v0.7.2-gaudi-ub24:1.21.0-555
68+
vllm-v0.7.2-gaudi-ub24:1.21.1-16
6469
```
6570

6671
6) (Optional) check your vLLM server by running this command in a **separate terminal**
6772
```bash
68-
model=meta-llama/Llama-3.1-8B-Instruct
73+
MODEL=meta-llama/Llama-3.1-8B-Instruct
6974
target=localhost
7075
curl_query="What is DeepLearning?"
7176
payload="{ \"model\": \"${model}\", \"prompt\": \"${curl_query}\", \"max_tokens\": 128, \"temperature\": 0 }"
@@ -132,7 +137,7 @@ P90 ITL (ms): 61.32
132137
</pre>
133138

134139
> Note:
135-
> The perftest.sh script runs with the following defaults
140+
> The perftest.sh script runs with the following defaults:
136141
> INPUT_TOKENS=2048
137142
> OUTPUT_TOKENS=2048
138143
> CONCURRENT_REQUESTS=64
@@ -148,10 +153,10 @@ docker exec vllm-server /root/scripts/perftest.sh 1024 3192 100
148153

149154
# Running vLLM server with custom parameters
150155
1) The following variables come with defaults but can be overridden with appropriate values
151-
- -e tensor_parallel_size (Optional number of cards to use. If not set, a default will be chosen)
152-
- -e max_model_len (Optional, set a length that suits your workload. If not set, a default will be chosen)
156+
- -e TENSOR_PARALLEL_SIZE (Optional, number of cards to use. If not set, a default will be chosen)
157+
- -e MAX_MODEL_LEN (Optional, set a length that suits your workload. If not set, a default will be chosen)
153158

154-
2) Example for bringing up a vLLM server with a custom max model length and tensor parallel size. Proxy variables and volumes added for reference.
159+
2) Example for bringing up a vLLM server with a custom max model length and tensor parallel (TP) size. Proxy variables and volumes added for reference.
155160
```bash
156161
docker run -it --rm \
157162
-e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \
@@ -163,16 +168,16 @@ docker run -it --rm \
163168
-e HF_TOKEN=YOUR_TOKEN_HERE \
164169
-e HABANA_VISIBLE_DEVICES=all \
165170
-p 8000:8000 \
166-
-e model=meta-llama/Llama-3.1-70B-Instruct \
167-
-e tensor_parallel_size=8 \
168-
-e max_model_len=8192 \
171+
-e MODEL=meta-llama/Llama-3.1-70B-Instruct \
172+
-e TENSOR_PARALLEL_SIZE=8 \
173+
-e MAX_MODEL_LEN=8192 \
169174
--name vllm-server \
170-
vllm-v0.7.2-gaudi-ub24:1.21.0-555
175+
vllm-v0.7.2-gaudi-ub24:1.21.1-16
171176
```
172177
3) Example for bringing up two Llama-70B instances with the recommended number of TP/cards. Each instance should have unique values for HABANA_VISIBLE_DEVICES, host port and instance name.
173178
For information on how to set HABANA_VISIBLE_DEVICES for a specific TP size, see [docs.habana.ai - Multiple Tenants](https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html)
174179
```
175-
CNAME=vllm-v0.7.2-gaudi-ub24:1.21.0-555
180+
CNAME=vllm-v0.7.2-gaudi-ub24:1.21.1-16
176181
HOST_PORT1=8000
177182
docker run -it --rm \
178183
-e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \
@@ -184,16 +189,16 @@ docker run -it --rm \
184189
-e HF_TOKEN=YOUR_TOKEN_HERE \
185190
-e HABANA_VISIBLE_DEVICES=0,1,2,3 \
186191
-p $HOST_PORT1:8000 \
187-
-e model=meta-llama/Llama-3.1-70B-Instruct \
188-
-e tensor_parallel_size=4 \
189-
-e max_model_len=8192 \
192+
-e MODEL=meta-llama/Llama-3.1-70B-Instruct \
193+
-e TENSOR_PARALLEL_SIZE=4 \
194+
-e MAX_MODEL_LEN=8192 \
190195
--name vllm-server1 \
191196
${CNAME}
192197
```
193198

194199
```
195200
## Run in Separate terminal
196-
CNAME=vllm-v0.7.2-gaudi-ub24:1.21.0-555
201+
CNAME=vllm-v0.7.2-gaudi-ub24:1.21.1-16
197202
HOST_PORT2=9222
198203
docker run -it --rm \
199204
-e http_proxy=$http_proxy -e https_proxy=$https_proxy -e no_proxy=$no_proxy \
@@ -205,9 +210,9 @@ docker run -it --rm \
205210
-e HF_TOKEN=YOUR_TOKEN_HERE \
206211
-e HABANA_VISIBLE_DEVICES=4,5,6,7
207212
-p $HOST_PORT2:8000 \
208-
-e model=meta-llama/Llama-3.1-70B-Instruct \
209-
-e tensor_parallel_size=4 \
210-
-e max_model_len=8192 \
213+
-e MODEL=meta-llama/Llama-3.1-70B-Instruct \
214+
-e TENSOR_PARALLEL_SIZE=4 \
215+
-e MAX_MODEL_LEN=8192 \
211216
--name vllm-server2 \
212217
${CNAME}
213218
```

PyTorch/vLLM_Tutorials/Deploying_vLLM/check_vllm.sh

Lines changed: 0 additions & 8 deletions
This file was deleted.

PyTorch/vLLM_Tutorials/Deploying_vLLM/entrypoint.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ LOG_FILE=$LOG_DIR/$LOG_FILE
1414
HF_HOME="${HF_HOME:-/root/.cache/huggingface}"
1515
export HF_HOME
1616

17-
python3 generate_vars.py settings_vllm.csv
17+
python3 vllm_autocalc.py settings_vllm.csv
1818
if [[ $? -ne 0 ]]; then
1919
echo "Settings Error. Exiting!"
2020
exit -1

0 commit comments

Comments
 (0)