Skip to content

Azure ML deployment: setting low memory request not taking effect #27672

Description

@kristofpanna

Describe the bug

Even if I set smaller than 500 MB memory request for the Azure ML deployment, at least 500 MB is always requested (as I understand, because of the storageinitializer init container).

Related command

az ml online-deployment create

Errors

(There is no error, but it does not work as expected.)

Issue script & Debug output

In Azure Machine Learning, there is an inference cluster named "reco-inference" which is a Azure Kubernetes cluster.

There is a custom instance type named "smallmemoryinstancetype" with 100Mi memory request, created like this:

kubectl apply -f smallmemory_instancetype.yaml

where the smallmemory_instancetype.yaml file contains:

apiVersion: amlarc.azureml.com/v1alpha1
kind: InstanceType
metadata:
    name: smallmemoryinstancetype
spec:
    resources:
    limits:
        cpu: "1"
        memory: "2Gi"
    requests:
        cpu: "10m"
        memory: "100Mi"

There is an azure ML environment: "machine-learning-recommendation-environment:12" (Linux, python version: 3.8).
There is a previously registered model: name: "modelname", version: 1. (Model artifact binary size: 78 mb.)
There is an endpoint named "endpointname" created like this:

az ml online-endpoint create --name endpointname --set compute=azureml:reco-inference

We deploy like this:

$azuremlModelId = "azureml:modelname:1"
az ml online-deployment create --name deploymentname -f deploymentConfigTest.yaml --set endpoint_name=endpointname --set model=$azuremlModelId --set environment=azureml:machine-learning-recommendation-environment:12

where deploymentConfigTest.yaml is:

type: kubernetes 

app_insights_enabled: true 

code_configuration: 

  code: .

  scoring_script: score.py 

request_settings: 

  request_timeout_ms: 3000 

  max_queue_wait_ms: 3000 

instance_type: smallmemoryinstancetype

instance_count: 1 

scale_settings: 

  type: default 

The deployment is successful. It appears in kubectl describe node as a pod. I can verify that the instance type is successfully set for the deployment (at the endpoint in Azure Machine Learning Studio).
instance_type_verif

When inspected with kubectl describe node, I can see the Memory Requests for the pod of my deployment.
It is exactly 500Mi.

Non-terminated Pods:          (13 in total)
  Namespace                   Name                                         CPU Requests  CPU Limits    Memory Requests  Memory Limits  Age
  ---------                   ----                                         ------------  ----------    ---------------  -------------  ---
  default                     deploymentname-endpointname-54d8bf5d5w9dz    110m (5%)     1100m (57%)   500Mi (10%)      2098Mi (45%)   18h
  ...

(I also verified that if I use an instance type with more than 500Mi memory request then more than 500Mi will be used, so the instance type setting itself is taking an effect on memory requests.)

As I understand, this is because of the storageinitializer init container (that is in the same pod as my inference server), where memory request is a fixed 500Mi amount.
For example, in the pod settings inspected in Lens, I see these settings for the init container:

initContainers:
- name: storageinitializer-modeldata
  ...
  resources:
    limits:
      cpu: 100m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 500Mi
  ...

Expected behavior

I would expect the Memory Requests for the pod of my deployment to be set to a smaller amount than 500Mi.

Environment Summary

OS: Azure DevOps windows-latest agent (Windows Server 2022 with Visual Studio 2022)

azure-cli 2.53.1

core 2.53.1
telemetry 1.1.0

Extensions:
azure-devops 0.26.0
connectedk8s 1.5.2
k8s-extension 1.5.1
ml 2.21.1

Dependencies:
msal 1.24.0b2
azure-mgmt-resource 23.1.0b2

Python location 'C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\python.exe'
Extensions directory 'C:\Users\PannaKristof.azure\cliextensions'

Python (Windows) 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:05:00) [MSC v.1929 32 bit (Intel)]

Additional context

My question is: why is the memory request I set on my deployment not taking an effect on the init container also?
Is there any other (maybe completely different) solution to achieve smaller memory request for the init container? (For example, it would be ideal if I could set the request size for the init container also dynamically when running the online-deployment create command.)
The reason for my question: we would like to deploy several small deployments, but it is very wasteful to use 500 MB memory for each of them (when eg. 65Mi would be sufficient).

(Or is it possible that the init container actually needs this much space to work and I should not try to set the memory request?)

Thank you in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Auto-AssignAuto assign by botMachine Learningaz mlService AttentionThis issue is responsible by Azure service team.act-codegen-extensibility-squadbugThis issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions