The NetApp DataOps Toolkit for Kubernetes can be used to manage inference servers within a Kubernetes cluster. The toolkit provides the ability to deploy, list, and delete NVIDIA Triton Inference Server instances
You can perform the following operation(s) using the toolkit's command line utility
| Triton Inference Server operations | Supported by BeeGFS | Supported by Trident |
|---|---|---|
| Deploy a new NVIDIA Triton Inference Server. | Yes | Yes |
| Delete an NVIDIA Triton Inference Server. | Yes | Yes |
| List all NVIDIA Triton Inference Servers in a specific namespace. | Yes | Yes |
The NetApp DataOps Toolkit can enable a user to deploy an NVIDIA Triton Inference Server instance on-demand. The command for deploying an NVIDIA Triton Inference Server instance is netapp_dataops_k8s_cli.py create triton-server.
The following options/arguments are required:
-s, --server-name= Name of a new Triton Inference Server.
-v, --model-repo-pvc-name= Name of the PVC containing the model repository.
The following options/arguments are optional:
-g, --nvidia-gpu= Number of NVIDIA GPUs to allocate to the Triton instance. Format: '1', '4', etc. If not specified, no GPUs will be allocated.
-h, --help Print help text.
-i, --image= Container image to use when creating Triton instance. If not specified, "nvcr.io/nvidia/tritonserver:21.11-py3" will be used.
-m, --memory= Amount of memory to reserve for Triton instance. Format: '1024Mi', '100Gi', '10Ti', etc. If not specified, no memory will be reserved.
-n, --namespace= Kubernetes namespace to create new server instance in. If not specified, server will be created in namespace "default".
-p, --cpu= Number of CPUs to reserve for Triton instance. Format: '0.5', '1', etc. If not specified, no CPUs will be reserved.
-b, --load-balancer Option to use a LoadBalancer instead of using NodePort service. If not specified, NodePort service will be utilized.
-r, --allocate-resource= Option to specify custom resource allocations, ex. 'nvidia.com/mig-1g.5gb=1'. If not specified, no custom resource will be allocated.
Deploy a new NVIDIA Triton Infernece Server instance and use LoadBalancer Service.
netapp_dataops_k8s_cli.py create triton-server --server-name=lb1 --model-repo-pvc-name=model-pvc --load-balancer
Creating Service 'ntap-dsutil-triton-lb1' in namespace 'default'.
Service successfully created.
Creating Deployment 'ntap-dsutil-triton-lb1' in namespace 'default'.
Deployment 'ntap-dsutil-triton-lb1' created.
Waiting for Deployment 'ntap-dsutil-triton-lb1' to reach Ready state.
Deployment successfully created.
Server successfully created.
Server endpoints:
http: 10.61.188.115:30601
grpc: 10.61.188.115:31835
metrics: 10.61.188.115:31880/metricsThe NetApp DataOps Toolkit can enable a user to near-instantaneously delete an existing NVIDIA Triton Inference Server instance. The command for deleting an NVIDIA Triton Inference Server instance is netapp_dataops_k8s_cli.py delete triton-server.
The following options/arguments are required:
-s, --server-name= Name of a new Triton Inference Server.
The following options/arguments are optional:
-f, --force Do not prompt user to confirm operation.
-h, --help Print help text.
-n, --namespace= Kubernetes namespace that the server instance is located in. If not specified, namespace "default" will be used.
Delete the NVIDIA Inference server 'mike' in namespace 'dsk-test'.
netapp_dataops_k8s_cli.py delete triton-server --server-name=mike --namespace=dsk-test
Warning: This server will be permanently deleted.
Are you sure that you want to proceed? (yes/no): yes
Deleting server 'mike' in namespace 'dsk-test'.
Note: this operation does NOT delete the model repository PVC.
Deleting Deployment...
Deleting Service...
Triton Server instance successfully deleted.The NetApp DataOps Toolkit can be used to print a list of all existing Triton Inference Servers in a specific namespace within a Kubernetes cluster. The command for printing a list of all existing NVIDIA Triton Servers is netapp_dataops_k8s_cli.py list triton-servers.
No options/arguments are required for this command.
The following options/arguments are optional:
-h, --help Print help text.
-n, --namespace= Kubernetes namespace for which to retrieve list of servers. If not specified, namespace "default" will be used.
List all NVIDIA Triton Inference Server instances in namespace "dsk-test".
netapp_dataops_k8s_cli.py list triton-servers --namespace=dsk-test
Server Name Status HTTP Endpoints gRPC Endpoint Metrics Endpoint
------------- --------- ------------------- ------------------- -------------------
imagesufian Ready 10.61.188.115:31102 10.61.188.115:31608 10.61.188.115:31149
imagesufian1 Not Ready 10.61.188.115:30744 10.61.188.115:32689 10.61.188.115:30772The NetApp DataOps Toolkit for Kubernetes provides a set of functions that can be imported into any Python program or Jupyter Notebook. In this manner, data scientists and data engineers can easily incorporate Kubernetes-native data management tasks into their existing projects, programs, and workflows. This functionality is only recommended for advanced users who are proficient in Python.
from netapp_dataops.k8s import create_triton_server
from netapp_dataops.k8s import delete_triton_server
from netapp_dataops.k8s import list_triton_serversThe following server management operations are available within the set of functions.
| Triton Inference Server operations | Supported by BeeGFS | Supported by Trident |
|---|---|---|
| Deploy a new NVIDIA Triton Inference Server. | Yes | Yes |
| Delete an NVIDIA Triton Inference Server. | Yes | Yes |
| List all NVIDIA Triton Inference Servers in a specific namespace. | Yes | Yes |
The NetApp DataOps Toolkit can enable a user to deploy an NVIDIA Triton Inference Server instance on-demand.
def create_triton_server(
server_name: str, # Name of the Triton Infernce Server Instance (required).
model_pvc_name: str, # Name of the PVC containing the model repository.
load_balancer_service: bool = False, # Option to use a LoadBalancer instead of using NodePort service. If not specified, NodePort service will be utilized.
namespace: str = "default", # Kubernetes namespace to create new server in. If not specified, server will be created in namespace "default".
server_image: str = "nvcr.io/nvidia/tritonserver:21.11-py3", # Container image to use when creating instance. If not specified, "nvcr.io/nvidia/tritonserver:21.11-py3" will be used.
request_cpu: str = None, # Number of CPUs to reserve for Triton instance. Format: '0.5', '1', etc. If not specified, no CPUs will be reserved.
request_memory: str = None, # Amount of memory to reserve for Triton instance. Format: '1024Mi', '100Gi', '10Ti', etc. If not specified, no memory will be reserved.
request_nvidia_gpu: str = None, # Number of NVIDIA GPUs to allocate to Triton instance. Format: '1', '4', etc. If not specified, no GPUs will be allocated.
allocate_resource: str = None, # Option to specify custom resource allocations, ex. 'nvidia.com/mig-1g.5gb=1'. If not specified, no custom resource will be allocated.
print_output: bool = False # Denotes whether or not to print messages to the console during execution.
) -> str :This function will return a list of server endpoints (in string format): ['<http_uri>', '<grpc_uri>', '<metrics_uri>']
If an error is encountered, the function will raise an exception of one of the following types. These exception types are defined in netapp_dataops.k8s.
InvalidConfigError # kubeconfig file is missing or is invalid.
APIConnectionError # The Kubernetes API returned an error.
ServiceUnavailableError # A Kubernetes service is not available.The NetApp DataOps Toolkit can enable a user to near-instantaneously delete an existing NVIDIA Triton Server instance.
def delete_triton_server(
server_name: str, # Name of NVIDIA Triton Server instance to be deleted (required).
namespace: str = "default", # Kubernetes namespace that the server is located in. If not specified, namespace "default" will be used.
print_output: bool = False # Denotes whether or not to print messages to the console during execution.
) :None
If an error is encountered, the function will raise an exception of one of the following types. These exception types are defined in netapp_dataops.k8s.
InvalidConfigError # kubeconfig file is missing or is invalid.
APIConnectionError # The Kubernetes API returned an error.The NetApp DataOps Toolkit can be used to retrieve a list of all existing NVIDIA Triton Server instances in a specific namespace within a Kubernetes cluster as part of any Python program or workflow.
def list_triton_servers(
namespace: str = "default", # Kubernetes namespace for which to retrieve list of servers. If not specified, namespace "default" will be used.
print_output: bool = False # Denotes whether or not to print messages to the console during execution.
) -> list :The function returns a list of all existing NVIDIA Triton Server instances. Each item in the list will be a dictionary containing details regarding a specific server. The keys for the values in this dictionary are "Server Name", "Status", "Server Endpoints".
If an error is encountered, the function will raise an exception of one of the following types. These exception types are defined in netapp_dataops.k8s.
InvalidConfigError # kubeconfig file is missing or is invalid.
APIConnectionError # The Kubernetes API returned an error.Report any issues via GitHub: https://github.com/NetApp/netapp-data-science-toolkit/issues.