You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 18, 2025. It is now read-only.
1) The following variables come with defaults but can be overridden with appropriate values
151
-
- -e tensor_parallel_size (Optional number of cards to use. If not set, a default will be chosen)
152
-
- -e max_model_len (Optional, set a length that suits your workload. If not set, a default will be chosen)
156
+
- -e TENSOR_PARALLEL_SIZE (Optional, number of cards to use. If not set, a default will be chosen)
157
+
- -e MAX_MODEL_LEN (Optional, set a length that suits your workload. If not set, a default will be chosen)
153
158
154
-
2) Example for bringing up a vLLM server with a custom max model length and tensor parallel size. Proxy variables and volumes added for reference.
159
+
2) Example for bringing up a vLLM server with a custom max model length and tensor parallel (TP) size. Proxy variables and volumes added for reference.
3) Example for bringing up two Llama-70B instances with the recommended number of TP/cards. Each instance should have unique values for HABANA_VISIBLE_DEVICES, host port and instance name.
173
178
For information on how to set HABANA_VISIBLE_DEVICES for a specific TP size, see [docs.habana.ai - Multiple Tenants](https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html)
0 commit comments