Adding model-deployment guide

Harika · Harika · commit 0af42cb2e36e · 2026-02-13T15:39:38.000-06:00
Signed-off-by: Harika &lt;codewith3@gmail.com&gt;
diff --git a/third_party/Dell/ubuntu-22.04/EI/single-node/troubleshooting.md b/third_party/Dell/ubuntu-22.04/EI/single-node/troubleshooting.md
@@ -65,7 +65,7 @@ Two options:
 Deployment fails due to incorrect or missing configuration values.
 
 **Fix:**
-Before re-running deployment, verify and update your inference-config.cfg:
+Before re-running deployment, verify and update your inference-config.cfg. These values must match your actual deployment environment.
 ```bash
 cluster_url=api.example.com  # <-- Replace with cluster url
 cert_file=~/certs/cert.pem
diff --git a/third_party/Dell/ubuntu-22.04/EI/single-node/user-guide-apisix.md b/third_party/Dell/ubuntu-22.04/EI/single-node/user-guide-apisix.md
@@ -8,10 +8,10 @@
   - [3. DNS and SSL/TLS Setup](#3-dns-and-ssltls-setup)
   - [4. Hugging Face Token Setup](#4-hugging-face-token-setup)
 - [Single Node Deployment Guide](#single-node-deployment-guide)
-  - [1. Configure the Setup Files and Environment](#2-configure-the-setup-files-and-environment)
-  - [2. Run the Deployment](#3-run-the-deployment)
-  - [3. Verify the Deployment](#4-verify-the-deployment)
-  - [4. Test the Inference](#5-test-the-inference)
+  - [1. Configure the Setup Files and Environment](#1-configure-the-setup-files-and-environment)
+  - [2. Run the Deployment](#2-run-the-deployment)
+  - [3. Verify the Deployment](#3-verify-the-deployment)
+  - [4. Test the Inference](#4-test-the-inference)
 - [Troubleshooting](#troubleshooting)
 - [Summary](#summary)
 
@@ -199,7 +199,7 @@ cp -f docs/examples/single-node/hosts.yaml core/inventory/hosts.yaml
 ### 2. Run the Deployment
 
 > **Note:**
-> The `--models` argument selects a model using its **numeric ID**  
+> The '--models' argument allows you to specify one or more models by their numeric ID. [full list of available model IDs](../../iac/README.md#pre-integrated-models-list)
 > If `--models` is omitted, the installer displays the full model list and prompts you to select a model interactively.
 
 Run the setup for Gaudi 
@@ -248,14 +248,15 @@ Before generating the access token, ensure all Keycloak-related values are corre
 ```bash
 cd Enterprise-Inference/core/scripts
 chmod +x generate-token.sh
-./generate-token.sh
+. generate-token.sh
 ```
 
 **Verify the Token**
 
 After the script completes successfully, confirm that the token is available in your shell:
 
 ```bash
+echo $BASE_URL
 echo $TOKEN
 ```
 
@@ -302,4 +303,4 @@ This document provides common deployment and runtime issues observed during Inte
 - Configured SSH, DNS, and SSL
 - Generated your Hugging Face token
 - Deployed Intel® AI for Enterprise Inference
-- Tested a working model endpoint
+- Tested a working model endpoint
diff --git a/third_party/Dell/ubuntu-22.04/iac/README.md b/third_party/Dell/ubuntu-22.04/iac/README.md
@@ -134,7 +134,7 @@ sudo ./deploy-enterprise-inference.sh \
 | -p    | OS userpassword |
 | -t	| Hugging Face token |
 | -g	| gaudi3 or cpu |
-| -m	| Model IDs |
+| -m	| Choose model ID from [Pre-Integrated Models List](#pre-integrated-models-list) , based on your deployment type (gaudi or cpu) |
 | -b	| Repo branch (default: release-1.4.0) |
 | -a    | cluster -url  |
 | -r	| Resume from last checkpoint |
@@ -269,6 +269,52 @@ if EI is deployed with apisix, follow [Testing EI model with apisix](../EI/singl
 if EI is deployed with genai, follow [Testing EI model with genai](../EI/single-node/user-guide-genai.md#5-test-the-inference) for generating api-key and testing the inference
 
 ---
+## Additional Information
+
+### Pre-Integrated Models List
+
+Enterprise Inference provides a set of pre-integrated and validated models optimized for performance and stability. These models can be deployed directly using the Enterprise Inference catalog.
+
+**Pre-Integrated Gaudi Models**
+
+**Model ID**   | **Model**                                  |
+----------------|:------------------------------------------:|
+1               | meta-llama/Llama-3.1-8B-Instruct           |
+2               | meta-llama/Llama-3.1-70B-Instruct          |
+3               | meta-llama/Llama-3.1-405B-Instruct         |
+4               | meta-llama/Llama-3.3-70B-Instruct          |
+5               | meta-llama/Llama-4-Scout-17B-16E-Instruct  |
+6               | Qwen/Qwen2.5-32B-Instruct                  |
+7               | deepseek-ai/DeepSeek-R1-Distill-Qwen-32B   |
+8               | deepseek-ai/DeepSeek-R1-Distill-Llama-8B   |
+9               | mistralai/Mixtral-8x7B-Instruct-v0.1       |
+10              | mistralai/Mistral-7B-Instruct-v0.3         |
+11              | BAAI/bge-base-en-v1.5                      |
+12              | BAAI/bge-reranker-base                     |
+13              | codellama/CodeLlama-34b-Instruct-hf        |
+14              | tiiuae/Falcon3-7B-Instruct                 |
+
+**Pre-Integrated CPU Models**
+
+ **Model ID**   | **Model**                                  |
+----------------|:------------------------------------------:|
+21               |   meta-llama/Llama-3.1-8B-Instruct  |  
+22               |   meta-llama/Llama-3.2-3B-Instruct  |   
+23               |   deepseek-ai/DeepSeek-R1-Distill-Llama-8B  |   
+24               |   deepseek-ai/DeepSeek-R1-Distill-Qwen-32B  |   
+25              |   Qwen/Qwen3-1.7B  |
+26              |   Qwen/Qwen3-4B-Instruct-2507 |
+
+
+### Model Deployment
+
+If an Enterprise Inference cluster is already deployed, you can use the interactive deployment script to manage models, including:
+
+ - Deploying additional models from the Enterprise Inference model catalog
+ - Deploying custom models directly from Hugging Face
+ - Undeploying existing models from the cluster
+
+Refer to the [Model Deployment guide](./model-deployment.md) and run the interactive inference-stack-deploy.sh script to perform these operations.
 
 ## Summary
 
diff --git a/third_party/Dell/ubuntu-22.04/iac/deploy-enterprise-inference.sh b/third_party/Dell/ubuntu-22.04/iac/deploy-enterprise-inference.sh
@@ -38,9 +38,9 @@ GPU_TYPE="Enter gaudi3/cpu based on your deployment"
 MODELS=""
 DEPLOYMENT_MODE="keycloak"
 DEPLOY_OBSERVABILITY="off"
-KEYCLOAK_CLIENT_ID="my-client-id"
-KEYCLOAK_ADMIN_USER="your-keycloak-admin-user"
-KEYCLOAK_ADMIN_PASSWORD="changeme"
+KEYCLOAK_CLIENT_ID="api"
+KEYCLOAK_ADMIN_USER="api-admin"
+KEYCLOAK_ADMIN_PASSWORD="changeme!!"
 FIRMWARE_VERSION="1.22.1"
 STATE_FILE="/tmp/ei-deploy.state"
 BRANCH="release-1.4.0"
@@ -792,14 +792,14 @@ main() {
             log_info "State file indicates a prior deployment; running interactively"
             CONFIG_FILE="/home/${USERNAME}/Enterprise-Inference/core/inventory/inference-config.cfg"
             update_inference_config
-            su "${USERNAME}" -c "cd /home/${USERNAME}/Enterprise-Inference/core && bash ./inference-stack-deploy.sh --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
+            sudo -u "${USERNAME}" -H bash -c "cd /home/${USERNAME}/Enterprise-Inference/core && bash ./inference-stack-deploy.sh --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
                 log_error "Enterprise Inference Stack deployment failed!"
                 log_warn "You can resume by running this script again with -r flag"
                 exit 1
             }
         else
             # Using echo to provide input: "1" for "Provision Enterprise Inference Cluster", "yes" for confirmation
-            su "${USERNAME}" -c "cd /home/${USERNAME}/Enterprise-Inference/core && echo -e '1\n${MODELS}\nyes' | bash ./inference-stack-deploy.sh --models '${MODELS}' --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
+            sudo -u "${USERNAME}" -H bash -c "cd /home/${USERNAME}/Enterprise-Inference/core && echo -e '1\n${MODELS}\nyes' | bash ./inference-stack-deploy.sh --models '${MODELS}' --cpu-or-gpu '${GPU_TYPE}' --hugging-face-token ${HUGGINGFACE_TOKEN}" || {
                 log_error "Enterprise Inference Stack deployment failed!"
                 log_warn "You can resume by running this script again with -r flag"
                 exit 1
diff --git a/third_party/Dell/ubuntu-22.04/iac/model-deployment.md b/third_party/Dell/ubuntu-22.04/iac/model-deployment.md
@@ -0,0 +1,218 @@
+# Intel® AI for Enterprise Inference — Model Deployment User Guide
+
+## Table of Contents
+
+1. [Overview](#1-overview)
+2. [Environment Prerequisites](#2-environment-prerequisites)
+3. [Model Deployment Workflow](#3-model-deployment-workflow)
+   - [Deploy Models from Enterprise Inference Catalog](#31-deploy-models-from-enterprise-inference-catalog)
+   - [Deploy Models Directly from Hugging Face](#32-deploy-models-directly-from-hugging-face)
+4. [Undeploy Models](#4-undeploy-models)
+   - [Undeploy Models from Enterprise Inference Catalog](#41-undeploy-models-from-enterprise-inference-catalog)
+   - [Undeploy Models Deployed from Hugging Face](#42-undeploy-models-deployed-from-hugging-face)
+
+## 1. Overview
+
+This guide outlines the standard procedure for deploying models on an Enterprise Inference cluster using the `inference-stack-deploy.sh` script.
+
+---
+
+## 2. Environment Prerequisites
+
+- **Host System:** Control plane or master node with access to the inference stack
+- **Cluster Access:** Existing or newly provisioned Kubernetes cluster
+- **Certificates:** Valid cluster certificate (`cert.pem`) and private key (`key.pem`)
+- **Hugging Face Token:** Required for downloading models from Hugging Face
+- **Script Path:** `~/Enterprise-Inference/core/inference-stack-deploy.sh`
+
+---
+
+## 3. Model Deployment Workflow
+
+1. Deploy from pre-integrated Enterprise Inference model catalog
+2. Deploy directly from Hugging Face
+
+Both use the same interactive script and menu flow.
+
+### 3.1 Deploy Models from Enterprise Inference Catalog
+
+This method deploys pre-integrated and validated models optimized for Enterprise Inference.
+
+**Step 1: Run the Deployment Script**
+
+```bash
+bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
+```
+
+**Step 2: Navigate Through the Menus**
+
+Choose the following options from the menu:
+
+**3** – Update Deployed Inference Cluster
+
+**2** – Manage LLM Models
+
+**1** – Deploy Model
+
+**Step 3: Select Model to Deploy**
+
+The script displays a list of available models and their corresponding numeric IDs based on the selected deployment type (CPU or Gaudi).
+
+When prompted to `Enter numbers of models to deploy/remove (comma-separated)`, enter the model ID you want to deploy (example: `1`).
+
+**Step 4: Confirm Deployment**
+
+When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
+
+**Once confirmed:**
+- The model is deployed automatically to the inference cluster.
+- All required Kubernetes Pods, Services, and Endpoints are created.
+
+**Test:**
+
+Run the following command to verify that the model pod is in the `Running` state.
+```bash
+kubectl get pods
+```
+---
+
+### 3.2 Deploy Models Directly from Hugging Face
+
+This option allows deploying any Hugging Face model, including models not pre-validated by Enterprise Inference.
+
+**Step 1: To deploy**
+```bash
+bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
+```
+**Step 2: Navigate Through the Menus**
+
+Choose the following options from the menu:
+
+**3** – Update Deployed Inference Cluster
+
+**2** – Manage LLM Models
+
+**4** – Deploy Model from Hugging Face
+
+**Step 3: Provide Hugging Face Model Details**
+
+When prompted to `Enter the Hugging Face Model ID`, enter the desired Hugging Face model ID (example: `mistralai/Mistral-7B-v0.3`).
+
+> Note: The model(mistralai/Mistral-7B-v0.3) above is only an example. You can enter any compatible Hugging Face model (CPU or Gaudi), depending on your deployment type.
+
+**Step 4: Provide Deployment name for the model**
+
+When prompted to `Enter the Hugging Face Model ID`, enter the desired Hugging Face model ID (example: `mistralai/Mistral-7B-v0.3`).
+
+> **Naming rules:**
+> - Lowercase letters only
+> - Numbers and hyphens allowed
+> - No spaces or special characters
+> - Must follow Kubernetes naming conventions
+
+**Step 5: Provide Tensor Parallel Size (Gaudi Only)**
+
+Set the tensor parallel size based on available Gaudi cards.
+
+> Note: > **Note:** This option deploys a model that has not been pre-validated. Ensure the tensor parallel size is configured correctly. An incorrect value may cause the model to remain in a "Not Ready" state.
+
+**Step 6: Confirm Deployment**
+
+When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
+
+**Test**
+
+Run the following command to verify that the model pod is in the `Running` state:
+
+```bash
+kubectl get pods
+```
+---
+
+## 4. Undeploy Models
+
+Enterprise Inference allows you to safely undeploy models that were deployed either from:
+- The Enterprise Inference model catalog
+- Directly from Hugging Face
+
+### 4.1 Undeploy Models from Enterprise Inference Catalog
+
+This method is used for models deployed through pre-integrated and validated models for Enterprise Inference.
+
+**Step 1: Run the Deployment Script**
+```bash
+bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
+```
+**Step 2: Navigate Through the Menus**
+
+Choose the following options from the menu:
+
+**3** – Update Deployed Inference Cluster
+
+**2** – Manage LLM Models
+
+**2** – Undeploy Model
+
+**Step 3: Select Model to Remove**
+
+The script displays a list of available models with their model IDs based on the deployment type (CPU or Gaudi).
+
+When Prompted to `Enter numbers of models to deploy/remove (comma-separated)` - Enter the model ID you want to remove(Example: 1)
+
+**Step 4: Confirm Model Removal**
+
+When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
+> CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss.
+
+**Once confirmed:**
+ - The model deployment is deleted
+ - All associated Kubernetes resources are removed
+
+**Test**
+
+Run below command to confirm, if the model pod is deleted.
+```bash
+kubectl get pods
+```
+---
+
+### 4.2 Undeploy Models Deployed from Hugging Face
+
+To remove Models deployed via Deploy Model from Hugging Face
+
+**Step 1: Run the Script**
+```bash
+bash ~/Enterprise-Inference/core/inference-stack-deploy.sh
+```
+**Step 2: Navigate Through the Menus**
+
+Choose the following options from the menu:
+
+**3** – Update Deployed Inference Cluster
+
+**2** – Manage LLM Models
+
+**5** – Remove Model using deployment name
+
+**Step 3: Provide Deployment Name**
+
+When prompted to `Enter Deployment Name for the Model`, provide a deployment name (example: `mistral-7b-v0-3`).
+
+> The deployment name must exactly match the name used during model deployment.
+
+**Step 4: Confirm Removal**
+
+When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed.
+> CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss.
+
+**Once confirmed:**
+ - The model deployment is deleted
+ - All associated Kubernetes resources are removed
+
+**Test**
+
+Run below command to confirm, if the model pod is deleted.
+```bash
+kubectl get pods
+```
+