Skip to content

feat(vpc): support an ephemeral scratch data volume on the builder VSI#157

Merged
deepakibms merged 3 commits into
IBM:masterfrom
cchristous:feat/vpc-data-volumes
Jun 30, 2026
Merged

feat(vpc): support an ephemeral scratch data volume on the builder VSI#157
deepakibms merged 3 commits into
IBM:masterfrom
cchristous:feat/vpc-data-volumes

Conversation

@cchristous

@cchristous cchristous commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

Adds an optional ephemeral scratch data volume to the ibmcloud-vpc builder, configured with four new fields:

vsi_data_vol_capacity  = 60       # GB (10–32000); attaching the volume is opt-in
vsi_data_vol_profile   = "sdp"    # general-purpose | 5iops-tier | 10iops-tier | sdp | custom
vsi_data_vol_iops      = 10000    # honored on custom/sdp
vsi_data_vol_bandwidth = 2000     # honored on custom

The volume is attached to the builder instance with DeleteVolumeOnInstanceDelete=true, so it is removed together with the throwaway VSI, and it is never part of the captured image — image capture targets the boot volume only (CreateImageImagePrototypeImageBySourceVolume).

Why

A VPC custom image is captured from the boot volume, and the export cost scales with how much of that volume has ever been written, not with the live filesystem — deleting files before capture does not shrink it. Today every byte a build touches (apt/package caches, language module caches, from-source build trees) lands on the boot volume and inflates both the captured image size and the capture time, even after cleanup.

There was previously no way to attach additional scratch storage to the builder. With this change a build can mount a scratch data volume and point its cache/build directories at it, keeping that transient churn off the boot volume so it is never exported at capture time. The included examples/build.vpc.data.volume.pkr.hcl shows the guest-side mount that makes the volume effective.

Measured impact: in an s390x Ubuntu image bake dominated by from-source builds (multi-GB Go module/compile caches plus shallow clones), moving those caches onto a scratch data volume cut custom-image capture from ~26–36 min to ~12 min end-to-end (~2–3×), with the captured image's contents unchanged. The gain scales with how much transient data the build writes; a light build sees little, a churn-heavy one sees a lot.

Design / why this shape

  • Deliberately mirrors the merged boot-volume work in Support sdp/custom boot volumes with configurable iops and bandwidth #151: flat vsi_data_vol_* fields (the plugin has no nested-block config), a single dataVolumeAttachments helper, and Config.Prepare validation with the same rules (capacity bounds, profile allowlist, iops/bandwidth honored only on custom/sdp, capacity required when tuned, no negatives). Keeping the boot/data structures parallel was an explicit goal.
  • The attachment is wired into all four create paths (by-image, catalog-offering, by-volume-id, by-snapshot) — all four instance prototypes accept VolumeAttachments. A data volume is independent of the boot source, so attaching scratch storage is valid even when booting from an existing volume or snapshot.
  • bandwidth is system-derived on the sdp profile, so the example pins IOPS on sdp and leaves bandwidth unset (the custom profile is the way to also pin bandwidth).

Risk

  • Blast radius: Low. The feature is fully opt-in — dataVolumeAttachments returns nil when vsi_data_vol_capacity is unset, and an unset VolumeAttachments (omitempty) is a no-op. Existing builds are byte-for-byte unaffected.
  • Change criticality: Low. Peripheral, additive config plumbing; no change to capture, networking, auth, or any existing path.
  • Overall: Low.
  • Mitigations: Opt-in default, parallel to already-merged Support sdp/custom boot volumes with configurable iops and bandwidth #151, and unit tests covering every validation branch and the attachment builder.

Why this is safe

Nothing runs differently unless a user sets vsi_data_vol_capacity. The new validation only adds rejections for misconfigured data-volume fields; it cannot affect a config that doesn't use them. The full suite passes (126 tests), and the change reuses the exact pattern reviewers already accepted for the boot volume.

Validation

  • go build ./..., go vet, and go test ./... (126 tests) all pass; gofmt clean.
  • Exercised end-to-end in a real VPC image bake: the data volume attached to the builder, was correctly excluded from the captured image, and the image reached AVAILABLE — see the measured capture-time impact above.
  • config.hcl2spec.go was regenerated by hand to mirror what go generate emits: packer-sdc (v0.2.9's bundled x/tools) cannot parse Go 1.25+ export data and fails on all three configs in this environment. The added entries are mechanical (*int/cty.Number, *string/cty.String) and identical in shape to the Support sdp/custom boot volumes with configurable iops and bandwidth #151 boot-volume entries; go vet confirms the struct and spec map stay in sync.
  • packer fmt was not run locally (packer not installed); the example was formatted by hand to match the sibling examples.
  • Known limitation, inherited from Support sdp/custom boot volumes with configurable iops and bandwidth #151: the per-create-path assignment of VolumeAttachments is not asserted by a test (there is no step.Run harness in the package); the dataVolumeAttachments helper itself is unit-tested.

Pre-submission review

Before opening, I ran an automated review pass over the diff — three parallel invocations each of a general code reviewer, a comments reviewer, and a test-coverage reviewer, followed by a simplification pass (reuse / simplification / efficiency / altitude). Findings addressed: tightened two doc comments that overstated guarantees, dropped the sdp + bandwidth combination from the example (bandwidth is system-derived there), made the example's disk detection robust across vd*/nvme* device names, clarified in the README that the volume is a no-op unless mounted, and added a custom iops+bandwidth test case for parity with #151. The simplification pass found no changes worth making that wouldn't break the intentional symmetry with the boot-volume code.

Signed-off per the DCO in CONTRIBUTING.md. Happy to open a tracking issue first if you'd prefer — opening as a draft for early feedback.

The vpc builder could only attach and tune a boot volume. There was no way
to attach additional scratch storage to the builder instance, so every byte a
build writes — package/module caches, downloads, from-source build trees —
lands on the boot volume. Because a VPC custom image is captured from the boot
volume and the export cost scales with how much of that volume has been
written (not the live filesystem), large transient writes inflate both the
captured image and the capture time even after they are deleted.

Add four optional fields — vsi_data_vol_capacity, vsi_data_vol_profile,
vsi_data_vol_iops, vsi_data_vol_bandwidth — that attach a single data volume
to the builder VSI with DeleteVolumeOnInstanceDelete=true, so it is removed
with the throwaway instance and is never part of the captured image (capture
targets the boot volume only). A build can mount it and point its cache/build
directories at it to keep that churn off the boot volume.

The data volume is wired into all four create paths (by-image, catalog
offering, by-volume-id, by-snapshot) via a dataVolumeAttachments helper, and
validated in Config.Prepare with the same rules as the boot volume from IBM#151:
capacity bounds, profile allowlist, iops/bandwidth honored only on custom/sdp,
capacity required when tuned, and no negatives. Regenerated config.hcl2spec.go;
added config_test.go coverage for the validation and the attachment builder;
documented the fields in README and added examples/build.vpc.data.volume.pkr.hcl
showing the guest-side mount that makes the volume effective.

Signed-off-by: Corey Christous <cchristous@confluent.io>
@cchristous cchristous marked this pull request as ready for review June 26, 2026 15:22

@astha-jain astha-jain left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Comment thread builder/ibmcloud/vpc/config.go Outdated
Split the boot- and data-volume validation so iops still requires
'custom' or 'sdp', but bandwidth requires 'sdp' only (custom does not
honor bandwidth). Update apply-side comments and tests to match.

Signed-off-by: Corey Christous <cchristous@confluent.io>
@cchristous cchristous force-pushed the feat/vpc-data-volumes branch from b6644a5 to be0287c Compare June 30, 2026 04:16
Comment thread builder/ibmcloud/vpc/config_test.go Outdated
Comment thread examples/build.vpc.data.volume.pkr.hcl Outdated
The validation now requires sdp for bandwidth (custom does not honor it),
so propagate that rule to the user-facing surfaces:
- README: vsi_boot/data_vol_bandwidth rows say sdp only (was custom or sdp)
- example: fix backwards comment that pointed bandwidth at the custom profile
- test: switch the data-volume iops+bandwidth case from custom to sdp

Signed-off-by: Corey Christous <cchristous@confluent.io>

@deepakibms deepakibms left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deepakibms deepakibms merged commit d20616d into IBM:master Jun 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants