feat: pre-populate devcontainer cache with local OCI Layout bundles#538
Conversation
Add Step 4b to the devcontainer post-create script that discovers OCI
Layout bundles from a configurable directory (COMPLYCTL_BUNDLES_DIR,
default /bundles/) and pre-populates the ~/.complytime/policies/ cache.
This enables complyctl generate and scan to work with private policy
bundles without complyctl get or any registry access.
The step is optional and non-blocking -- if no bundles directory exists
or a bundle fails to process, the devcontainer setup continues normally.
Implementation details:
- Bundle name validation (alphanumeric, hyphens, underscores only)
- index.json validation before copying (prevents half-populated cache)
- jq for JSON manipulation (already in Containerfile, no new deps)
- Atomic state.json writes via temp file + mv
- Idempotent complytime.yaml insertion (grep guard prevents duplicates)
- awk for cross-platform YAML line insertion
- Dummy URL pattern localhost:0/policies/{name} passes ValidateOCIRef
but is never contacted by generate/scan
- cp -a for complete recursive copy including dotfiles
No complyctl source code changes required.
Security review (review council):
- Shell injection: eliminated by using jq instead of python3 heredocs
- Path traversal: blocked by bundle name regex validation
- Error isolation: per-bundle continue on failure, no script abort
Documentation:
- docs/TESTING_ENVIRONMENT.md: Private Bundles section with DevPod
mount instructions and workflow
- AGENTS.md: Recent Changes entry added
OpenSpec artifacts: openspec/changes/devcontainer-bundle-cache/
Suggestion: Extend the mock registry to serve mounted policy filesThe current approach works and is fine for unblocking your testing workflow -- I'm good to approve it as-is. This is a suggestion for a follow-up improvement (or you can fold it into this PR if you prefer). The concernThe shell script in Step 4b replicates cache-internal knowledge that lives in Go code -- the The alternativeThe mock registry already has all the machinery needed. Extending this to read from a mounted directory is ~25 lines of Go: // seedFromDirectory discovers Gemara policy files from a filesystem
// directory and registers them in the content store, exactly like
// the embedded testdata in seedDefaults().
func (s *contentStore) seedFromDirectory(dir string) {
entries, err := os.ReadDir(dir)
if err != nil {
return // no directory, nothing to do
}
for _, entry := range entries {
if !entry.IsDir() {
continue
}
name := entry.Name()
policyDir := filepath.Join(dir, name)
catalog, err := os.ReadFile(filepath.Join(policyDir, "catalog.yaml"))
if err != nil {
log.Printf("WARNING: skipping %s: %v", name, err)
continue
}
policy, err := os.ReadFile(filepath.Join(policyDir, "policy.yaml"))
if err != nil {
log.Printf("WARNING: skipping %s: %v", name, err)
continue
}
s.addArtifact("policies/"+name, []string{"latest"}, []layerDef{
{mediaType: gemaraCatalogType, data: catalog},
{mediaType: gemaraPolicyType, data: policy},
})
log.Printf("Seeded policy from directory: policies/%s", name)
}
}Then in contentDir := os.Getenv("MOCK_REGISTRY_CONTENT_DIR")
if contentDir == "" {
contentDir = "/bundles"
}
store.seedFromDirectory(contentDir)What this changes for
|
| Aspect | Current (cache bypass) | Registry serving |
|---|---|---|
| Cache format coupling | Shell must match state.json schema + dir layout |
None -- complyctl get owns it |
complyctl get |
Fails for local bundles | Works for all policies |
| Digest management | Shell copies blobs, jq writes digest to state |
addArtifact() handles everything |
| Maintenance | Script breaks silently if cache internals change | Registry + get flow stays in sync |
| Testing | Manual validation only | seedFromDirectory() is unit-testable |
| Dependencies | Adds jq to shell path |
None new -- reuses existing Go code |
| Code size | ~125 lines of shell | ~25 lines of Go + ~15 lines of shell |
| User mental model | "skip get, use generate directly" |
Standard workflow for everything |
Note on bundle format
seedPolicyFromFiles() produces split-layer format artifacts (separate catalog + policy layers). If any private policies use the Gemara bundle format (DetectManifestShape() in internal/policy/loader.go), that would need a separate loading path. For standard Gemara catalog + policy pairs, the existing addArtifact() handles everything.
Timing
This can be a follow-up PR -- the current approach works for immediate unblocking. If done as a follow-up, it replaces the current Step 4b entirely.
Replace the 125-line shell-based cache bypass in Step 4b with seedFromDirectory() in the mock OCI registry. The registry now reads Gemara catalog.yaml and policy.yaml files from a mounted directory and serves them as OCI artifacts alongside the embedded testdata. This eliminates coupling between the shell script and cache internals (state.json schema, directory layout, digest management). Users mount raw Gemara YAML instead of pre-built OCI Layout bundles, and the standard get -> generate -> scan workflow works for all policies. Security hardening: - Validate directory names against ^[a-zA-Z0-9_-]+$ regex - Reject symlinked directories and files (Lstat-based checks) - Cap file reads at 10 MB to prevent resource exhaustion - Trust model documented in code comments Addresses: complytime#538 (comment)
Replace the 125-line shell-based cache bypass in Step 4b with seedFromDirectory() in the mock OCI registry. The registry now reads Gemara catalog.yaml and policy.yaml files from a mounted directory and serves them as OCI artifacts alongside the embedded testdata. This eliminates coupling between the shell script and cache internals (state.json schema, directory layout, digest management). Users mount raw Gemara YAML instead of pre-built OCI Layout bundles, and the standard get -> generate -> scan workflow works for all policies. Security hardening: - Validate directory names against ^[a-zA-Z0-9_-]+$ regex - Reject symlinked directories and files (Lstat-based checks) - Cap file reads at 10 MB to prevent resource exhaustion - Trust model documented in code comments Addresses: complytime#538 (comment)
22e6378 to
6edb97a
Compare
Combine MOCK_REGISTRY_CONTENT_DIR env var from this branch with nohup + log file redirect improvements from upstream/main.
marcusburghardt
left a comment
There was a problem hiding this comment.
Summary
The registry-serving approach is a strong improvement — seedFromDirectory() cleanly reuses the existing addArtifact() machinery, and the test coverage (12 tests including security edge cases) is thorough. One functional bug must be fixed: the policy URLs inserted into complytime.yaml are missing the http:// scheme prefix, which will cause complyctl get to attempt HTTPS against the plainHTTP mock registry.
Additionally, the OpenSpec artifacts (design.md, proposal.md, spec.md, tasks.md) still describe the original cache-bypass approach (commit 1) and were not updated for the registry-serving refactor (commit 2). Key discrepancies: python3 references, localhost:0 URLs, "no mock registry changes" non-goal, OCI Layout input format, state.json manipulation. These should be updated to reflect the implemented approach.
This review was generated by /review-pr (AI-assisted).
- Add missing http:// scheme prefix to policy URLs in post-create.sh (grep guard, awk insertion, printf fallback) so complyctl get uses plainHTTP mode against the mock registry - Fix http:// prefix in docs/TESTING_ENVIRONMENT.md for consistency - Rename TestSeedFromDirectory_DoesNotOverrideDefaults to TestSeedFromDirectory_OverwritesExistingRepo to match assertion - Update AGENTS.md recent changes entry to describe registry-serving approach instead of cache-bypass - Update OpenSpec artifacts (proposal, design, tasks) to reflect the implemented registry-serving approach
marcusburghardt
left a comment
There was a problem hiding this comment.
LGTM. Thanks @hbraswelrh
|
@jpower432 once this PR is merged the complyctl release can help with the dependent PRs: |
…erge Resolved conflicts in 5 files caused by PR complytime#538 (devcontainer-bundle-cache) merging into main: - cmd/mock-oci-registry/main.go: Keep both 'path' (for embed.FS) and 'path/filepath' + 'regexp' (for seedFromDirectory) imports - cmd/mock-oci-registry/main_test.go: Keep all tests from both branches (buildTarGzFromFS/seedDefaults + seedFromDirectory) - .devcontainer/scripts/post-create.sh: Keep inline deployment generation (avoids shipping K8s manifest in testdata which triggers security scanner false positives) - AGENTS.md: Keep OPA entry from this PR + upstream's updated devcontainer-bundle-cache entry - docs/TESTING_ENVIRONMENT.md: Keep OPA command reference section and accept upstream's http:// URL fix
Summary
Add optional private policy support to the devcontainer by
extending the mock OCI registry to serve mounted Gemara YAML
files. This enables
complyctl get,complyctl generate, andcomplyctl scanto work with private policy bundles through thestandard workflow — no cache bypass or registry access required.
Problem
Private policy bundles cannot be committed to GitHub or embedded
in the mock OCI registry testdata without exposing their content.
Users evaluating private compliance policies in the devcontainer
need the full
get->generate->scanpipeline to work withlocal bundles that never touch a registry or the repository.
Solution
The mock OCI registry gains a
seedFromDirectory()method thatdiscovers Gemara policy files from a mounted directory and serves
them as OCI artifacts alongside the embedded testdata. The
post-create script adds policy entries to
complytime.yamlpointing at the mock registry, so
complyctl getpopulates thecache through normal code paths.
User-facing input format
Users mount raw Gemara YAML files (not OCI Layout bundles):
How it works
seedFromDirectory()): At startup, readsMOCK_REGISTRY_CONTENT_DIR(default/bundles/), discoverssubdirectories containing
catalog.yaml+policy.yaml,registers them via the existing
addArtifact()machinery.appends policy entries to
complytime.yamlpointing atlocalhost:8765/policies/{name}.MOCK_REGISTRY_CONTENT_DIRto the mock registry process.
Existing workflow preserved
The mock registry workflow is completely unaffected. Both embedded
testdata and mounted policies are served by the same registry:
test-ampel-bp)my-private-policy)complyctl getcomplyctl generatecomplyctl scanSecurity hardening
^[a-zA-Z0-9_-]+$in both Go and shell (consistent)entry.Type()skips symlinkeddirectories;
os.Lstatrejects symlinked filesreadFileLimited()caps file readsat 10 MB to prevent OOM from oversized or adversarial files
os.ReadDirreturns base names only;paths constructed via
filepath.Joinwith hardcoded filenamesoperator-controlled trust assumption
validations or
//nolintwith rationaleFiles Changed
cmd/mock-oci-registry/main.goseedFromDirectory(),readFileLimited(),validBundleNameregex,maxPolicyFileSizeconstant,MOCK_REGISTRY_CONTENT_DIRenv varcmd/mock-oci-registry/main_test.go.devcontainer/scripts/post-create.shdocs/TESTING_ENVIRONMENT.mdgetworkflow,seedFromDirectory()explanationComparison to previous approach
state.jsonschema + dir layoutcomplyctl getowns itcomplyctl getjqwrites digest to stateaddArtifact()handles everythinggetflow stays in syncjqin containerget, usegeneratedirectly"Related Issues
OpenSpec Artifacts
Spec artifacts at
openspec/changes/devcontainer-bundle-cache/—all complete (proposal, design, specs/bundle-cache-prepopulation,
tasks).