CNF-23534: Add automated OpenAPI schema generation for PolicyGenerator list merging#787
CNF-23534: Add automated OpenAPI schema generation for PolicyGenerator list merging#787irinamihai wants to merge 3 commits into
Conversation
CNF-23534: Add an openapi schema for all the lists in the reference configuration
Description:
Add automated OpenAPI schema generation for PolicyGenerator list merging
Two-step pipeline to dynamically generate schema.openapi files from
upstream CRDs, replacing the previously hand-crafted schemas:
1. generate-schema-config.py scans Subscription CRs to update
crd-schema-config.json with current channels and git refs
2. extract-schema.py downloads CRDs and extracts minimal merge
directive schemas for ACM PolicyGenerator
Adds nightly GitHub Actions workflow to auto-PR when schemas drift.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: irinamihai The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds tooling and CI to generate ACM PolicyGenerator OpenAPI merge-directive schemas: a JSON CRD config, Python CLIs to extract and process CRD schemas, Makefile targets to orchestrate generation/checking, generated schema.openapi artifacts for telco components, documentation, and a scheduled workflow that opens automated PRs. ChangesOpenAPI Schema Generation Infrastructure
Sequence Diagram(s)sequenceDiagram
actor Developer
participant Make
participant GenerateConfig
participant ExtractSchema
participant Git
participant Workflow
Developer->>Make: make generate-openapi-schemas
Make->>GenerateConfig: run hack/generate-schema-config.py
GenerateConfig->>GenerateConfig: scan subscriptions / detect OCP version
GenerateConfig->>Make: write updated hack/crd-schema-config.json
Make->>ExtractSchema: run hack/extract-schema.py (telco-ran)
ExtractSchema->>ExtractSchema: fetch CRDs, convert, inject merge keys, prune
ExtractSchema->>Git: write telco-ran schema.openapi
Make->>ExtractSchema: run hack/extract-schema.py (telco-core)
ExtractSchema->>Git: write telco-core schema.openapi
Workflow->>Make: (daily) make generate-openapi-schemas
Workflow->>Git: git diff --exit-code
Workflow->>Workflow: create PR if schemas changed
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@irinamihai: This pull request references CNF-23534 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold for further testing |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
.github/workflows/openapi-schema-check.yml (1)
25-25: ⚡ Quick winPin PyYAML to a specific version for reproducibility.
Installing PyYAML without a version constraint means the workflow will fetch whatever version is latest at runtime. If PyYAML releases breaking changes, the workflow could fail unpredictably.
📌 Proposed fix to pin PyYAML version
- pip install PyYAML + pip install PyYAML==6.0.1Update the version number as needed to match the version tested with your scripts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/openapi-schema-check.yml at line 25, Replace the unpinned installer command "pip install PyYAML" in the GitHub Actions job with a pinned version (e.g., "pip install PyYAML==<version>") to ensure reproducible runs; pick and lock the exact PyYAML version you tested against and update that version string in the workflow step that currently runs pip install PyYAML.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/openapi-schema-check.yml:
- Line 19: Replace semantic version tags for GitHub Actions with pinned commit
SHAs to satisfy supply-chain requirements: update usages of actions/checkout@v4
and peter-evans/create-pull-request@v7 to their corresponding full commit SHAs
(e.g., actions/checkout@<sha> and peter-evans/create-pull-request@<sha>), fetch
the latest stable commit SHAs from each action's repository first, and commit
the workflow changes so the workflow references the fixed SHAs instead of the
version tags.
- Around line 23-24: The workflow currently downloads yq_linux_amd64 and makes
it executable without integrity checks; update the steps around the curl and
chmod so you first fetch or define an expected checksum (e.g.,
YQ_CHECKSUM/YQ_VERSION), verify the downloaded file's SHA256 (or other selected
hash) against that expected value using a verification tool (sha256sum or shasum
-a 256) and only call chmod +x /usr/local/bin/yq and move it into the PATH if
the checksum matches; ensure the job fails if verification fails and document
updating YQ_VERSION and YQ_CHECKSUM to the intended release.
In `@hack/extract-schema.py`:
- Around line 236-240: The branch that handles "x-kubernetes-patch-strategy" can
KeyError on schema["x-kubernetes-patch-merge-key"]; update the logic in the
block that builds result (the code using the local variable schema and creating
result) to read the merge key via schema.get("x-kubernetes-patch-merge-key") and
only include the "x-kubernetes-patch-merge-key" entry in the result when that
get() returns a non-None value (keep the "x-kubernetes-patch-strategy" entry as
before and preserve the default for "type").
In `@hack/generate-schema-config.py`:
- Around line 66-70: The aggregation logic that populates results[pkg_name] with
"channel" and "components" silently ignores when the same spec.name (pkg_name)
appears with a different channel; update the block that checks pkg_name and
channel (where results is built and
results[pkg_name]["components"].add(component) is called) to detect if
results[pkg_name]["channel"] already exists and differs from the current
channel, and on that conflict raise an error (or exit non‑zero) with a clear
message referencing the spec.name/pkg_name and both channels; do not merge
components across differing channels.
---
Nitpick comments:
In @.github/workflows/openapi-schema-check.yml:
- Line 25: Replace the unpinned installer command "pip install PyYAML" in the
GitHub Actions job with a pinned version (e.g., "pip install PyYAML==<version>")
to ensure reproducible runs; pick and lock the exact PyYAML version you tested
against and update that version string in the workflow step that currently runs
pip install PyYAML.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 00891ba2-3175-4320-88cf-d34e51e42032
📒 Files selected for processing (11)
.github/workflows/openapi-schema-check.ymlMakefileVERSION_UPDATE_GUIDE.mdhack/crd-schema-config.jsonhack/extract-schema.pyhack/generate-schema-config.pytelco-core/configuration/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/README.mdtelco-ran/configuration/argocd/example/acmpolicygenerator/hub-side-templating/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/hub-side-templating/schema.openapitelco-ran/configuration/argocd/example/acmpolicygenerator/schema.openapi
| if pkg_name and channel: | ||
| if pkg_name not in results: | ||
| results[pkg_name] = {"channel": channel, "components": set()} | ||
| results[pkg_name]["components"].add(component) | ||
|
|
There was a problem hiding this comment.
Detect and fail on cross-component channel conflicts per package.
If the same spec.name is found with different channels, current logic silently keeps the first channel and merges components, which can generate wrong refs and schemas.
Proposed fix
if pkg_name and channel:
if pkg_name not in results:
results[pkg_name] = {"channel": channel, "components": set()}
+ elif results[pkg_name]["channel"] != channel:
+ raise RuntimeError(
+ f"Conflicting channels for package '{pkg_name}': "
+ f"'{results[pkg_name]['channel']}' vs '{channel}'"
+ )
results[pkg_name]["components"].add(component)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hack/generate-schema-config.py` around lines 66 - 70, The aggregation logic
that populates results[pkg_name] with "channel" and "components" silently
ignores when the same spec.name (pkg_name) appears with a different channel;
update the block that checks pkg_name and channel (where results is built and
results[pkg_name]["components"].add(component) is called) to detect if
results[pkg_name]["channel"] already exists and differs from the current
channel, and on that conflict raise an error (or exit non‑zero) with a clear
message referencing the spec.name/pkg_name and both channels; do not merge
components across differing channels.
Description: - check openapi schema workflow - security hardening - warning for channel clash across core vs ran - README updates
|
/unhold |
| contents: write | ||
| pull-requests: write |
There was a problem hiding this comment.
Are there finer grained permissions that can be granted. These are pretty broad.
There was a problem hiding this comment.
These are already the minimum required permissions for peter-evans/create-pull-request - contents: write to push the commit and pull-requests: write to create the PR. This matches the permissions used by doc-updater.yml in this repo. There are no finer-grained alternatives for this use case where we want to automatically create PRs.
What restrictions/rules did you have in mind?
| YQ_VERSION="v4.53.2" | ||
| YQ_SHA256="d56bf5c6819e8e696340c312bd70f849dc1678a7cda9c2ad63eebd906371d56b" | ||
| curl -sL "https://github.com/mikefarah/yq/releases/download/${YQ_VERSION}/yq_linux_amd64" -o /usr/local/bin/yq | ||
| echo "${YQ_SHA256} /usr/local/bin/yq" | sha256sum -c - |
There was a problem hiding this comment.
Will PolicyGenerator follow the redirect path?
Also nit: newline
There was a problem hiding this comment.
PolicyGenerator rejects symlinks that resolve outside the directory tree. The file is kept as a regular copy of the parent schema.openapi. The following work to restructure the directories to resemble telco-core plus the removal of the non templated PolicyGenerator examples will consolidate this so there's a single schema.openapi co-located with the PolicyGenerator CRs.
For now, I'm going to leave the full content in this file and update the Makefile to copy the generated schema in here.
| "Static fields (source, merge_keys, version, package_name, ref_rule)", | ||
| "are manually maintained.", | ||
| "Dynamic fields (subscription_channel, components, source.github.ref)", | ||
| "are updated by: python3 hack/generate-schema-config.py", |
There was a problem hiding this comment.
Would it make sense to separate the static and dynamic/derived data into separate files? Perhaps yaml file listing the static (human generated) data.
Do we need persistence of the dynamic data, or is it for archival/informational purposes (ie a log of what was found)?
There was a problem hiding this comment.
The dynamic fields (subscription_channel, components, source.github.ref) are needed by extract-schema.py to know which GitHub branch to fetch CRDs from - they're not just informational. generate-schema-config.py updates them in-place while preserving the static fields (source.github.{owner,repo,path},merge_keys, version, ref_rule).
Splitting into two files would add complexity to the merge/read logic, since the round-trip update preserves static fields and git diff shows exactly what changed. The _comment block at the top documents which fields are static vs dynamic.
We can revisit if the config grows significantly, but for 6 CRD entries the single file keeps things simple.
WDYT?
… restructure is complete
Add automated OpenAPI schema generation for PolicyGenerator list merging
Two-step pipeline to dynamically generate schema.openapi files from upstream CRDs, replacing the previously hand-crafted schemas:
Adds nightly GitHub Actions workflow to auto-PR when schemas drift.