Skip to content

Commit 0af42cb

Browse files
author
Harika
committed
Adding model-deployment guide
Signed-off-by: Harika <codewith3@gmail.com>
1 parent af6c020 commit 0af42cb

5 files changed

Lines changed: 279 additions & 14 deletions

File tree

third_party/Dell/ubuntu-22.04/EI/single-node/troubleshooting.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Two options:
6565
Deployment fails due to incorrect or missing configuration values.
6666

6767
**Fix:**
68-
Before re-running deployment, verify and update your inference-config.cfg:
68+
Before re-running deployment, verify and update your inference-config.cfg. These values must match your actual deployment environment.
6969
```bash
7070
cluster_url=api.example.com # <-- Replace with cluster url
7171
cert_file=~/certs/cert.pem

third_party/Dell/ubuntu-22.04/EI/single-node/user-guide-apisix.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@
88
- [3. DNS and SSL/TLS Setup](#3-dns-and-ssltls-setup)
99
- [4. Hugging Face Token Setup](#4-hugging-face-token-setup)
1010
- [Single Node Deployment Guide](#single-node-deployment-guide)
11-
- [1. Configure the Setup Files and Environment](#2-configure-the-setup-files-and-environment)
12-
- [2. Run the Deployment](#3-run-the-deployment)
13-
- [3. Verify the Deployment](#4-verify-the-deployment)
14-
- [4. Test the Inference](#5-test-the-inference)
11+
- [1. Configure the Setup Files and Environment](#1-configure-the-setup-files-and-environment)
12+
- [2. Run the Deployment](#2-run-the-deployment)
13+
- [3. Verify the Deployment](#3-verify-the-deployment)
14+
- [4. Test the Inference](#4-test-the-inference)
1515
- [Troubleshooting](#troubleshooting)
1616
- [Summary](#summary)
1717

@@ -199,7 +199,7 @@ cp -f docs/examples/single-node/hosts.yaml core/inventory/hosts.yaml
199199
### 2. Run the Deployment
200200
201201
> **Note:**
202-
> The `--models` argument selects a model using its **numeric ID**
202+
> The '--models' argument allows you to specify one or more models by their numeric ID. [full list of available model IDs](../../iac/README.md#pre-integrated-models-list)
203203
> If `--models` is omitted, the installer displays the full model list and prompts you to select a model interactively.
204204
205205
Run the setup for Gaudi
@@ -248,14 +248,15 @@ Before generating the access token, ensure all Keycloak-related values are corre
248248
```bash
249249
cd Enterprise-Inference/core/scripts
250250
chmod +x generate-token.sh
251-
./generate-token.sh
251+
. generate-token.sh
252252
```
253253
254254
**Verify the Token**
255255
256256
After the script completes successfully, confirm that the token is available in your shell:
257257
258258
```bash
259+
echo $BASE_URL
259260
echo $TOKEN
260261
```
261262
@@ -302,4 +303,4 @@ This document provides common deployment and runtime issues observed during Inte
302303
- Configured SSH, DNS, and SSL
303304
- Generated your Hugging Face token
304305
- Deployed Intel® AI for Enterprise Inference
305-
- Tested a working model endpoint
306+
- Tested a working model endpoint

third_party/Dell/ubuntu-22.04/iac/README.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ sudo ./deploy-enterprise-inference.sh \
134134
| -p | OS userpassword |
135135
| -t | Hugging Face token |
136136
| -g | gaudi3 or cpu |
137-
| -m | Model IDs |
137+
| -m | Choose model ID from [Pre-Integrated Models List](#pre-integrated-models-list) , based on your deployment type (gaudi or cpu) |
138138
| -b | Repo branch (default: release-1.4.0) |
139139
| -a | cluster -url |
140140
| -r | Resume from last checkpoint |
@@ -269,6 +269,52 @@ if EI is deployed with apisix, follow [Testing EI model with apisix](../EI/singl
269269
if EI is deployed with genai, follow [Testing EI model with genai](../EI/single-node/user-guide-genai.md#5-test-the-inference) for generating api-key and testing the inference
270270

271271
---
272+
## Additional Information
273+
274+
### Pre-Integrated Models List
275+
276+
Enterprise Inference provides a set of pre-integrated and validated models optimized for performance and stability. These models can be deployed directly using the Enterprise Inference catalog.
277+
278+
**Pre-Integrated Gaudi Models**
279+
280+
**Model ID** | **Model** |
281+
----------------|:------------------------------------------:|
282+
1 | meta-llama/Llama-3.1-8B-Instruct |
283+
2 | meta-llama/Llama-3.1-70B-Instruct |
284+
3 | meta-llama/Llama-3.1-405B-Instruct |
285+
4 | meta-llama/Llama-3.3-70B-Instruct |
286+
5 | meta-llama/Llama-4-Scout-17B-16E-Instruct |
287+
6 | Qwen/Qwen2.5-32B-Instruct |
288+
7 | deepseek-ai/DeepSeek-R1-Distill-Qwen-32B |
289+
8 | deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
290+
9 | mistralai/Mixtral-8x7B-Instruct-v0.1 |
291+
10 | mistralai/Mistral-7B-Instruct-v0.3 |
292+
11 | BAAI/bge-base-en-v1.5 |
293+
12 | BAAI/bge-reranker-base |
294+
13 | codellama/CodeLlama-34b-Instruct-hf |
295+
14 | tiiuae/Falcon3-7B-Instruct |
296+
297+
**Pre-Integrated CPU Models**
298+
299+
**Model ID** | **Model** |
300+
----------------|:------------------------------------------:|
301+
21 | meta-llama/Llama-3.1-8B-Instruct |
302+
22 | meta-llama/Llama-3.2-3B-Instruct |
303+
23 | deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
304+
24 | deepseek-ai/DeepSeek-R1-Distill-Qwen-32B |
305+
25 | Qwen/Qwen3-1.7B |
306+
26 | Qwen/Qwen3-4B-Instruct-2507 |
307+
308+
309+
### Model Deployment
310+
311+
If an Enterprise Inference cluster is already deployed, you can use the interactive deployment script to manage models, including:
312+
313+
- Deploying additional models from the Enterprise Inference model catalog
314+
- Deploying custom models directly from Hugging Face
315+
- Undeploying existing models from the cluster
316+
317+
Refer to the [Model Deployment guide](./model-deployment.md) and run the interactive inference-stack-deploy.sh script to perform these operations.
272318

273319
## Summary
274320

third_party/Dell/ubuntu-22.04/iac/deploy-enterprise-inference.sh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,9 @@ GPU_TYPE="Enter gaudi3/cpu based on your deployment"
3838
MODELS=""
3939
DEPLOYMENT_MODE="keycloak"
4040
DEPLOY_OBSERVABILITY="off"
41-
KEYCLOAK_CLIENT_ID="my-client-id"
42-
KEYCLOAK_ADMIN_USER="your-keycloak-admin-user"
43-
KEYCLOAK_ADMIN_PASSWORD="changeme"
41+
KEYCLOAK_CLIENT_ID="api"
42+
KEYCLOAK_ADMIN_USER="api-admin"
43+
KEYCLOAK_ADMIN_PASSWORD="changeme!!"
4444
FIRMWARE_VERSION="1.22.1"
4545
STATE_FILE="/tmp/ei-deploy.state"
4646
BRANCH="release-1.4.0"
@@ -792,14 +792,14 @@ main() {
792792
log_info "State file indicates a prior deployment; running interactively"
793793
CONFIG_FILE="/home/${USERNAME}/Enterprise-Inference/core/inventory/inference-config.cfg"
794794
update_inference_config
795-
su "${USERNAME}" -c "cd /home/${USERNAME}/Enterprise-Inference/core && bash ./inference-stack-deploy.sh --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
795+
sudo -u "${USERNAME}" -H bash -c "cd /home/${USERNAME}/Enterprise-Inference/core && bash ./inference-stack-deploy.sh --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
796796
log_error "Enterprise Inference Stack deployment failed!"
797797
log_warn "You can resume by running this script again with -r flag"
798798
exit 1
799799
}
800800
else
801801
# Using echo to provide input: "1" for "Provision Enterprise Inference Cluster", "yes" for confirmation
802-
su "${USERNAME}" -c "cd /home/${USERNAME}/Enterprise-Inference/core && echo -e '1\n${MODELS}\nyes' | bash ./inference-stack-deploy.sh --models '${MODELS}' --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
802+
sudo -u "${USERNAME}" -H bash -c "cd /home/${USERNAME}/Enterprise-Inference/core && echo -e '1\n${MODELS}\nyes' | bash ./inference-stack-deploy.sh --models '${MODELS}' --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
803803
log_error "Enterprise Inference Stack deployment failed!"
804804
log_warn "You can resume by running this script again with -r flag"
805805
exit 1
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
# Intel® AI for Enterprise Inference — Model Deployment User Guide
2+
3+
## Table of Contents
4+
5+
1. [Overview](#1-overview)
6+
2. [Environment Prerequisites](#2-environment-prerequisites)
7+
3. [Model Deployment Workflow](#3-model-deployment-workflow)
8+
- [Deploy Models from Enterprise Inference Catalog](#31-deploy-models-from-enterprise-inference-catalog)
9+
- [Deploy Models Directly from Hugging Face](#32-deploy-models-directly-from-hugging-face)
10+
4. [Undeploy Models](#4-undeploy-models)
11+
- [Undeploy Models from Enterprise Inference Catalog](#41-undeploy-models-from-enterprise-inference-catalog)
12+
- [Undeploy Models Deployed from Hugging Face](#42-undeploy-models-deployed-from-hugging-face)
13+
14+
## 1. Overview
15+
16+
This guide outlines the standard procedure for deploying models on an Enterprise Inference cluster using the `inference-stack-deploy.sh` script.
17+
18+
---
19+
20+
## 2. Environment Prerequisites
21+
22+
- **Host System:** Control plane or master node with access to the inference stack
23+
- **Cluster Access:** Existing or newly provisioned Kubernetes cluster
24+
- **Certificates:** Valid cluster certificate (`cert.pem`) and private key (`key.pem`)
25+
- **Hugging Face Token:** Required for downloading models from Hugging Face
26+
- **Script Path:** `~/Enterprise-Inference/core/inference-stack-deploy.sh`
27+
28+
---
29+
30+
## 3. Model Deployment Workflow
31+
32+
1. Deploy from pre-integrated Enterprise Inference model catalog
33+
2. Deploy directly from Hugging Face
34+
35+
Both use the same interactive script and menu flow.
36+
37+
### 3.1 Deploy Models from Enterprise Inference Catalog
38+
39+
This method deploys pre-integrated and validated models optimized for Enterprise Inference.
40+
41+
**Step 1: Run the Deployment Script**
42+
43+
```bash
44+
bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
45+
```
46+
47+
**Step 2: Navigate Through the Menus**
48+
49+
Choose the following options from the menu:
50+
51+
**3** – Update Deployed Inference Cluster
52+
53+
**2** – Manage LLM Models
54+
55+
**1** – Deploy Model
56+
57+
**Step 3: Select Model to Deploy**
58+
59+
The script displays a list of available models and their corresponding numeric IDs based on the selected deployment type (CPU or Gaudi).
60+
61+
When prompted to `Enter numbers of models to deploy/remove (comma-separated)`, enter the model ID you want to deploy (example: `1`).
62+
63+
**Step 4: Confirm Deployment**
64+
65+
When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
66+
67+
**Once confirmed:**
68+
- The model is deployed automatically to the inference cluster.
69+
- All required Kubernetes Pods, Services, and Endpoints are created.
70+
71+
**Test:**
72+
73+
Run the following command to verify that the model pod is in the `Running` state.
74+
```bash
75+
kubectl get pods
76+
```
77+
---
78+
79+
### 3.2 Deploy Models Directly from Hugging Face
80+
81+
This option allows deploying any Hugging Face model, including models not pre-validated by Enterprise Inference.
82+
83+
**Step 1: To deploy**
84+
```bash
85+
bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
86+
```
87+
**Step 2: Navigate Through the Menus**
88+
89+
Choose the following options from the menu:
90+
91+
**3** – Update Deployed Inference Cluster
92+
93+
**2** – Manage LLM Models
94+
95+
**4** – Deploy Model from Hugging Face
96+
97+
**Step 3: Provide Hugging Face Model Details**
98+
99+
When prompted to `Enter the Hugging Face Model ID`, enter the desired Hugging Face model ID (example: `mistralai/Mistral-7B-v0.3`).
100+
101+
> Note: The model(mistralai/Mistral-7B-v0.3) above is only an example. You can enter any compatible Hugging Face model (CPU or Gaudi), depending on your deployment type.
102+
103+
**Step 4: Provide Deployment name for the model**
104+
105+
When prompted to `Enter the Hugging Face Model ID`, enter the desired Hugging Face model ID (example: `mistralai/Mistral-7B-v0.3`).
106+
107+
> **Naming rules:**
108+
> - Lowercase letters only
109+
> - Numbers and hyphens allowed
110+
> - No spaces or special characters
111+
> - Must follow Kubernetes naming conventions
112+
113+
**Step 5: Provide Tensor Parallel Size (Gaudi Only)**
114+
115+
Set the tensor parallel size based on available Gaudi cards.
116+
117+
> Note: > **Note:** This option deploys a model that has not been pre-validated. Ensure the tensor parallel size is configured correctly. An incorrect value may cause the model to remain in a "Not Ready" state.
118+
119+
**Step 6: Confirm Deployment**
120+
121+
When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
122+
123+
**Test**
124+
125+
Run the following command to verify that the model pod is in the `Running` state:
126+
127+
```bash
128+
kubectl get pods
129+
```
130+
---
131+
132+
## 4. Undeploy Models
133+
134+
Enterprise Inference allows you to safely undeploy models that were deployed either from:
135+
- The Enterprise Inference model catalog
136+
- Directly from Hugging Face
137+
138+
### 4.1 Undeploy Models from Enterprise Inference Catalog
139+
140+
This method is used for models deployed through pre-integrated and validated models for Enterprise Inference.
141+
142+
**Step 1: Run the Deployment Script**
143+
```bash
144+
bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
145+
```
146+
**Step 2: Navigate Through the Menus**
147+
148+
Choose the following options from the menu:
149+
150+
**3** – Update Deployed Inference Cluster
151+
152+
**2** – Manage LLM Models
153+
154+
**2** – Undeploy Model
155+
156+
**Step 3: Select Model to Remove**
157+
158+
The script displays a list of available models with their model IDs based on the deployment type (CPU or Gaudi).
159+
160+
When Prompted to `Enter numbers of models to deploy/remove (comma-separated)` - Enter the model ID you want to remove(Example: 1)
161+
162+
**Step 4: Confirm Model Removal**
163+
164+
When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
165+
> CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss.
166+
167+
**Once confirmed:**
168+
- The model deployment is deleted
169+
- All associated Kubernetes resources are removed
170+
171+
**Test**
172+
173+
Run below command to confirm, if the model pod is deleted.
174+
```bash
175+
kubectl get pods
176+
```
177+
---
178+
179+
### 4.2 Undeploy Models Deployed from Hugging Face
180+
181+
To remove Models deployed via Deploy Model from Hugging Face
182+
183+
**Step 1: Run the Script**
184+
```bash
185+
bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
186+
```
187+
**Step 2: Navigate Through the Menus**
188+
189+
Choose the following options from the menu:
190+
191+
**3** – Update Deployed Inference Cluster
192+
193+
**2** – Manage LLM Models
194+
195+
**5** – Remove Model using deployment name
196+
197+
**Step 3: Provide Deployment Name**
198+
199+
When prompted to `Enter Deployment Name for the Model`, provide a deployment name (example: `mistral-7b-v0-3`).
200+
201+
> The deployment name must exactly match the name used during model deployment.
202+
203+
**Step 4: Confirm Removal**
204+
205+
When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
206+
> CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss.
207+
208+
**Once confirmed:**
209+
- The model deployment is deleted
210+
- All associated Kubernetes resources are removed
211+
212+
**Test**
213+
214+
Run below command to confirm, if the model pod is deleted.
215+
```bash
216+
kubectl get pods
217+
```
218+

0 commit comments

Comments
 (0)