Requesting a Python/CUDA example

Hi,

The existing examples are very good. But given that the GPU/AI/ML features were highlighted in [the introductory blog post](https://cloud.google.com/blog/products/compute/new-batch-service-processes-batch-jobs-on-google-cloud) ("Use accelerator-optimized resources."), it would be nice to see a full example here. 

If it helps, I've tried this on my own, but got some errors:

```json
{
    "taskGroups": [
        {
        "taskSpec": {
            "computeResource": {
                "cpuMilli": "20000",
                "memoryMib": "15000"
            },
            "runnables": [
          {
            "container": {
              "imageUri": "pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime",
                "entrypoint": "/bin/sh",
                "commands": ["-c", "python -c \"import torch;print(torch.cuda.is_available())\""]
            }
          }
            ],
            "maxRetryCount": 2,
            "maxRunDuration": "3600s"
        },
        "taskCount": 1,
        "parallelism": 1
        }
    ],
    "allocationPolicy": {
            "instances": [
                {
                    "instanceTemplate": "alan-test-instance-template-3"
                }
            ]
        },
    "logsPolicy": {
        "destination": "CLOUD_LOGGING"
    }
}
```

The log output is:

```
2022-08-18 09:31:08.760 EDT
Task action/STARTUP/0/0/group0/0, STDOUT: Reading package lists...
2022-08-18 09:31:08.772 EDT
Task action/STARTUP/0/0/group0/0, STDOUT:
2022-08-18 09:31:08.777 EDT
Task action/STARTUP/0/0/group0/0, STDOUT: Building dependency tree...
2022-08-18 09:31:08.904 EDT
Task action/STARTUP/0/0/group0/0, STDOUT: Reading state information...
2022-08-18 09:31:08.905 EDT
Task action/STARTUP/0/0/group0/0, STDOUT:
2022-08-18 09:31:08.954 EDT
Task action/STARTUP/0/0/group0/0, STDOUT: Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies:
2022-08-18 09:31:09.008 EDT
Task action/STARTUP/0/0/group0/0, STDOUT: docker.io : Depends: runc (>= 1.0.0~rc6~)
2022-08-18 09:31:09.019 EDT
Task action/STARTUP/0/0/group0/0, STDERR: E: Unable to correct problems, you have held broken packages.
```

And for reference, here's the info for my instance template:

```json
{
  "creationTimestamp": "2022-08-17T14:05:29.128-07:00",
  "description": "",
  "id": "[redacted]",
  "kind": "compute#instanceTemplate",
  "name": "alan-test-instance-template-3",
  "properties": {
    "confidentialInstanceConfig": {
      "enableConfidentialCompute": false
    },
    "description": "",
    "scheduling": {
      "onHostMaintenance": "TERMINATE",
      "provisioningModel": "STANDARD",
      "automaticRestart": true,
      "preemptible": false
    },
    "tags": {},
    "disks": [
      {
        "type": "PERSISTENT",
        "deviceName": "alan-test-instance-template-3",
        "autoDelete": true,
        "index": 0,
        "boot": true,
        "kind": "compute#attachedDisk",
        "mode": "READ_WRITE",
        "initializeParams": {
          "sourceImage": "projects/ml-images/global/images/c0-deeplearning-common-cu110-v20220806-debian-10",
          "diskType": "pd-balanced",
          "diskSizeGb": "100"
        }
      },
      {
        "type": "PERSISTENT",
        "deviceName": "persistent-disk-1",
        "autoDelete": false,
        "index": 1,
        "kind": "compute#attachedDisk",
        "mode": "READ_WRITE",
        "initializeParams": {
          "description": "",
          "diskType": "pd-balanced",
          "diskSizeGb": "100"
        }
      }
    ],
    "networkInterfaces": [
      {
        "name": "nic0",
        "network": "projects/[redacted]/global/networks/default",
        "accessConfigs": [
          {
            "name": "External NAT",
            "type": "ONE_TO_ONE_NAT",
            "kind": "compute#accessConfig",
            "networkTier": "PREMIUM"
          }
        ],
        "kind": "compute#networkInterface"
      }
    ],
    "reservationAffinity": {
      "consumeReservationType": "ANY_RESERVATION"
    },
    "canIpForward": false,
    "keyRevocationActionType": "NONE",
    "machineType": "n1-standard-4",
    "metadata": {
      "fingerprint": "[redacted]",
      "kind": "compute#metadata"
    },
    "shieldedVmConfig": {
      "enableSecureBoot": false,
      "enableVtpm": true,
      "enableIntegrityMonitoring": true
    },
    "shieldedInstanceConfig": {
      "enableSecureBoot": false,
      "enableVtpm": true,
      "enableIntegrityMonitoring": true
    },
    "serviceAccounts": [
      {
        "email": "[redacted]@developer.gserviceaccount.com",
        "scopes": [
          "https://www.googleapis.com/auth/devstorage.read_only",
          "https://www.googleapis.com/auth/logging.write",
          "https://www.googleapis.com/auth/monitoring.write",
          "https://www.googleapis.com/auth/servicecontrol",
          "https://www.googleapis.com/auth/service.management.readonly",
          "https://www.googleapis.com/auth/trace.append"
        ]
      }
    ],
    "guestAccelerators": [
      {
        "acceleratorCount": 1,
        "acceleratorType": "nvidia-tesla-t4"
      }
    ],
    "displayDevice": {
      "enableDisplay": false
    }
  },
  "selfLink": "projects/[redacted]/global/instanceTemplates/alan-test-instance-template-3"
}
```

EDIT: Digging through the `Job` spec to the [`ComputeResource` spec](https://cloud.google.com/batch/docs/reference/rest/v1alpha/projects.locations.jobs#ComputeResource), I see the following:

```
gpuCount | string (int64 format)The GPU count.Not yet implemented.
-- | --

gpuCount	
string ([int64](https://developers.google.com/discovery/v1/type-format) format)

The GPU count.

Not yet implemented.
```

Does this imply GPU jobs are not yet supported?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requesting a Python/CUDA example #18

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Requesting a Python/CUDA example #18

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions