|
| 1 | +# Intel® AI for Enterprise Inference — Model Deployment User Guide |
| 2 | + |
| 3 | +## Table of Contents |
| 4 | + |
| 5 | +1. [Overview](#1-overview) |
| 6 | +2. [Environment Prerequisites](#2-environment-prerequisites) |
| 7 | +3. [Model Deployment Workflow](#3-model-deployment-workflow) |
| 8 | + - [Deploy Models from Enterprise Inference Catalog](#31-deploy-models-from-enterprise-inference-catalog) |
| 9 | + - [Deploy Models Directly from Hugging Face](#32-deploy-models-directly-from-hugging-face) |
| 10 | +4. [Undeploy Models](#4-undeploy-models) |
| 11 | + - [Undeploy Models from Enterprise Inference Catalog](#41-undeploy-models-from-enterprise-inference-catalog) |
| 12 | + - [Undeploy Models Deployed from Hugging Face](#42-undeploy-models-deployed-from-hugging-face) |
| 13 | + |
| 14 | +## 1. Overview |
| 15 | + |
| 16 | +This guide outlines the standard procedure for deploying models on an Enterprise Inference cluster using the `inference-stack-deploy.sh` script. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## 2. Environment Prerequisites |
| 21 | + |
| 22 | +- **Host System:** Control plane or master node with access to the inference stack |
| 23 | +- **Cluster Access:** Existing or newly provisioned Kubernetes cluster |
| 24 | +- **Certificates:** Valid cluster certificate (`cert.pem`) and private key (`key.pem`) |
| 25 | +- **Hugging Face Token:** Required for downloading models from Hugging Face |
| 26 | +- **Script Path:** `~/Enterprise-Inference/core/inference-stack-deploy.sh` |
| 27 | + |
| 28 | +--- |
| 29 | + |
| 30 | +## 3. Model Deployment Workflow |
| 31 | + |
| 32 | +1. Deploy from pre-integrated Enterprise Inference model catalog |
| 33 | +2. Deploy directly from Hugging Face |
| 34 | + |
| 35 | +Both use the same interactive script and menu flow. |
| 36 | + |
| 37 | +### 3.1 Deploy Models from Enterprise Inference Catalog |
| 38 | + |
| 39 | +This method deploys pre-integrated and validated models optimized for Enterprise Inference. |
| 40 | + |
| 41 | +**Step 1: Run the Deployment Script** |
| 42 | + |
| 43 | +```bash |
| 44 | +bash ~/Enterprise-Inference/core/inference-stack-deploy.sh |
| 45 | +``` |
| 46 | + |
| 47 | +**Step 2: Navigate Through the Menus** |
| 48 | + |
| 49 | +Choose the following options from the menu: |
| 50 | + |
| 51 | +**3** – Update Deployed Inference Cluster |
| 52 | + |
| 53 | +**2** – Manage LLM Models |
| 54 | + |
| 55 | +**1** – Deploy Model |
| 56 | + |
| 57 | +**Step 3: Select Model to Deploy** |
| 58 | + |
| 59 | +The script displays a list of available models and their corresponding numeric IDs based on the selected deployment type (CPU or Gaudi). |
| 60 | + |
| 61 | +When prompted to `Enter numbers of models to deploy/remove (comma-separated)`, enter the model ID you want to deploy (example: `1`). |
| 62 | + |
| 63 | +**Step 4: Confirm Deployment** |
| 64 | + |
| 65 | +When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed. |
| 66 | + |
| 67 | +**Once confirmed:** |
| 68 | +- The model is deployed automatically to the inference cluster. |
| 69 | +- All required Kubernetes Pods, Services, and Endpoints are created. |
| 70 | + |
| 71 | +**Test:** |
| 72 | + |
| 73 | +Run the following command to verify that the model pod is in the `Running` state. |
| 74 | +```bash |
| 75 | +kubectl get pods |
| 76 | +``` |
| 77 | +--- |
| 78 | + |
| 79 | +### 3.2 Deploy Models Directly from Hugging Face |
| 80 | + |
| 81 | +This option allows deploying any Hugging Face model, including models not pre-validated by Enterprise Inference. |
| 82 | + |
| 83 | +**Step 1: To deploy** |
| 84 | +```bash |
| 85 | +bash ~/Enterprise-Inference/core/inference-stack-deploy.sh |
| 86 | +``` |
| 87 | +**Step 2: Navigate Through the Menus** |
| 88 | + |
| 89 | +Choose the following options from the menu: |
| 90 | + |
| 91 | +**3** – Update Deployed Inference Cluster |
| 92 | + |
| 93 | +**2** – Manage LLM Models |
| 94 | + |
| 95 | +**4** – Deploy Model from Hugging Face |
| 96 | + |
| 97 | +**Step 3: Provide Hugging Face Model Details** |
| 98 | + |
| 99 | +When prompted to `Enter the Hugging Face Model ID`, enter the desired Hugging Face model ID (example: `mistralai/Mistral-7B-v0.3`). |
| 100 | + |
| 101 | +> Note: The model(mistralai/Mistral-7B-v0.3) above is only an example. You can enter any compatible Hugging Face model (CPU or Gaudi), depending on your deployment type. |
| 102 | +
|
| 103 | +**Step 4: Provide Deployment name for the model** |
| 104 | + |
| 105 | +When prompted to `Enter the Hugging Face Model ID`, enter the desired Hugging Face model ID (example: `mistralai/Mistral-7B-v0.3`). |
| 106 | + |
| 107 | +> **Naming rules:** |
| 108 | +> - Lowercase letters only |
| 109 | +> - Numbers and hyphens allowed |
| 110 | +> - No spaces or special characters |
| 111 | +> - Must follow Kubernetes naming conventions |
| 112 | +
|
| 113 | +**Step 5: Provide Tensor Parallel Size (Gaudi Only)** |
| 114 | + |
| 115 | +Set the tensor parallel size based on available Gaudi cards. |
| 116 | + |
| 117 | +> Note: > **Note:** This option deploys a model that has not been pre-validated. Ensure the tensor parallel size is configured correctly. An incorrect value may cause the model to remain in a "Not Ready" state. |
| 118 | +
|
| 119 | +**Step 6: Confirm Deployment** |
| 120 | + |
| 121 | +When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed. |
| 122 | + |
| 123 | +**Test** |
| 124 | + |
| 125 | +Run the following command to verify that the model pod is in the `Running` state: |
| 126 | + |
| 127 | +```bash |
| 128 | +kubectl get pods |
| 129 | +``` |
| 130 | +--- |
| 131 | + |
| 132 | +## 4. Undeploy Models |
| 133 | + |
| 134 | +Enterprise Inference allows you to safely undeploy models that were deployed either from: |
| 135 | +- The Enterprise Inference model catalog |
| 136 | +- Directly from Hugging Face |
| 137 | + |
| 138 | +### 4.1 Undeploy Models from Enterprise Inference Catalog |
| 139 | + |
| 140 | +This method is used for models deployed through pre-integrated and validated models for Enterprise Inference. |
| 141 | + |
| 142 | +**Step 1: Run the Deployment Script** |
| 143 | +```bash |
| 144 | +bash ~/Enterprise-Inference/core/inference-stack-deploy.sh |
| 145 | +``` |
| 146 | +**Step 2: Navigate Through the Menus** |
| 147 | + |
| 148 | +Choose the following options from the menu: |
| 149 | + |
| 150 | +**3** – Update Deployed Inference Cluster |
| 151 | + |
| 152 | +**2** – Manage LLM Models |
| 153 | + |
| 154 | +**2** – Undeploy Model |
| 155 | + |
| 156 | +**Step 3: Select Model to Remove** |
| 157 | + |
| 158 | +The script displays a list of available models with their model IDs based on the deployment type (CPU or Gaudi). |
| 159 | + |
| 160 | +When Prompted to `Enter numbers of models to deploy/remove (comma-separated)` - Enter the model ID you want to remove(Example: 1) |
| 161 | + |
| 162 | +**Step 4: Confirm Model Removal** |
| 163 | + |
| 164 | +When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed. |
| 165 | +> CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss. |
| 166 | +
|
| 167 | +**Once confirmed:** |
| 168 | + - The model deployment is deleted |
| 169 | + - All associated Kubernetes resources are removed |
| 170 | + |
| 171 | +**Test** |
| 172 | + |
| 173 | +Run below command to confirm, if the model pod is deleted. |
| 174 | +```bash |
| 175 | +kubectl get pods |
| 176 | +``` |
| 177 | +--- |
| 178 | + |
| 179 | +### 4.2 Undeploy Models Deployed from Hugging Face |
| 180 | + |
| 181 | +To remove Models deployed via Deploy Model from Hugging Face |
| 182 | + |
| 183 | +**Step 1: Run the Script** |
| 184 | +```bash |
| 185 | +bash ~/Enterprise-Inference/core/inference-stack-deploy.sh |
| 186 | +``` |
| 187 | +**Step 2: Navigate Through the Menus** |
| 188 | + |
| 189 | +Choose the following options from the menu: |
| 190 | + |
| 191 | +**3** – Update Deployed Inference Cluster |
| 192 | + |
| 193 | +**2** – Manage LLM Models |
| 194 | + |
| 195 | +**5** – Remove Model using deployment name |
| 196 | + |
| 197 | +**Step 3: Provide Deployment Name** |
| 198 | + |
| 199 | +When prompted to `Enter Deployment Name for the Model`, provide a deployment name (example: `mistral-7b-v0-3`). |
| 200 | + |
| 201 | +> The deployment name must exactly match the name used during model deployment. |
| 202 | +
|
| 203 | +**Step 4: Confirm Removal** |
| 204 | + |
| 205 | +When prompted to `Do you wish to continue? (y/n)`, type **y** to proceed. |
| 206 | +> CAUTION: Removing the Inference LLM Model will also remove its associated services and resources, which may cause service downtime and potential data loss. |
| 207 | +
|
| 208 | +**Once confirmed:** |
| 209 | + - The model deployment is deleted |
| 210 | + - All associated Kubernetes resources are removed |
| 211 | + |
| 212 | +**Test** |
| 213 | + |
| 214 | +Run below command to confirm, if the model pod is deleted. |
| 215 | +```bash |
| 216 | +kubectl get pods |
| 217 | +``` |
| 218 | + |
0 commit comments