Azure ML deployment: setting low memory request not taking effect

### Describe the bug

Even if I set smaller than 500 MB memory request for the Azure ML deployment, at least 500 MB is always requested (as I understand, because of the `storageinitializer` init container).

### Related command

`az ml online-deployment create`

### Errors

(There is no error, but it does not work as expected.)

### Issue script & Debug output

In Azure Machine Learning, there is an inference cluster named "reco-inference" which is a Azure Kubernetes cluster.

There is a custom instance type named "smallmemoryinstancetype" with 100Mi memory request, created like this:

```kubectl
kubectl apply -f smallmemory_instancetype.yaml
```

where the `smallmemory_instancetype.yaml` file contains:

```yaml
apiVersion: amlarc.azureml.com/v1alpha1
kind: InstanceType
metadata:
    name: smallmemoryinstancetype
spec:
    resources:
    limits:
        cpu: "1"
        memory: "2Gi"
    requests:
        cpu: "10m"
        memory: "100Mi"
```

There is an azure ML environment: "machine-learning-recommendation-environment:12" (Linux, python version: 3.8).
There is a previously registered model: name: "modelname", version: 1. (Model artifact binary size: 78 mb.)
There is an endpoint named "endpointname" created like this:

```ps
az ml online-endpoint create --name endpointname --set compute=azureml:reco-inference
```

We deploy like this:

```ps
$azuremlModelId = "azureml:modelname:1"
az ml online-deployment create --name deploymentname -f deploymentConfigTest.yaml --set endpoint_name=endpointname --set model=$azuremlModelId --set environment=azureml:machine-learning-recommendation-environment:12
```

where `deploymentConfigTest.yaml` is:

```yaml
type: kubernetes 

app_insights_enabled: true 

code_configuration: 

  code: .

  scoring_script: score.py 

request_settings: 

  request_timeout_ms: 3000 

  max_queue_wait_ms: 3000 

instance_type: smallmemoryinstancetype

instance_count: 1 

scale_settings: 

  type: default 
```

The deployment is successful. It appears in `kubectl describe node` as a pod. I can verify that the instance type is successfully set for the deployment (at the endpoint in Azure Machine Learning Studio).
![instance_type_verif](https://github.com/Azure/azure-cli/assets/26227564/8acdf67c-3c26-4105-aff4-f1bbd0977675)

When inspected with `kubectl describe node`, I can see the Memory Requests for the pod of my deployment.
It is exactly 500Mi.

```
Non-terminated Pods:          (13 in total)
  Namespace                   Name                                         CPU Requests  CPU Limits    Memory Requests  Memory Limits  Age
  ---------                   ----                                         ------------  ----------    ---------------  -------------  ---
  default                     deploymentname-endpointname-54d8bf5d5w9dz    110m (5%)     1100m (57%)   500Mi (10%)      2098Mi (45%)   18h
  ...
```

(I also verified that if I use an instance type with more than 500Mi memory request then more than 500Mi will be used, so the instance type setting itself is taking an effect on memory requests.)

As I understand, this is because of the storageinitializer init container (that is in the same pod as my inference server), where memory request is a fixed 500Mi amount.
For example, in the pod settings inspected in Lens, I see these settings for the init container:

```yaml
initContainers:
- name: storageinitializer-modeldata
  ...
  resources:
    limits:
      cpu: 100m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 500Mi
  ...
```

### Expected behavior

I would expect the Memory Requests for the pod of my deployment to be set to a smaller amount than 500Mi.

### Environment Summary

OS: Azure DevOps windows-latest agent (Windows Server 2022 with Visual Studio 2022)

azure-cli                         2.53.1

core                              2.53.1
telemetry                          1.1.0

Extensions:
azure-devops                      0.26.0
connectedk8s                       1.5.2
k8s-extension                      1.5.1
ml                                2.21.1

Dependencies:
msal                            1.24.0b2
azure-mgmt-resource             23.1.0b2

Python location 'C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\python.exe'
Extensions directory 'C:\Users\PannaKristof\.azure\cliextensions'

Python (Windows) 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:05:00) [MSC v.1929 32 bit (Intel)]

### Additional context

My question is: why is the memory request I set on my deployment not taking an effect on the init container also?
Is there any other (maybe completely different) solution to achieve **smaller memory request for the init container**? (For example, it would be ideal if I could set the request size for the init container also dynamically when running the `online-deployment create` command.)
The reason for my question: we would like to deploy several small deployments, but it is very wasteful to use 500 MB memory for each of them (when eg. 65Mi would be sufficient).

(Or is it possible that the init container actually needs this much space to work and I should not try to set the memory request?)

Thank you in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Azure ML deployment: setting low memory request not taking effect #27672

Describe the bug

Related command

Errors

Issue script & Debug output

Expected behavior

Environment Summary

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Azure ML deployment: setting low memory request not taking effect #27672

Description

Describe the bug

Related command

Errors

Issue script & Debug output

Expected behavior

Environment Summary

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions