OTA-1963: skills: Add cluster-update skills (update-advisor, product-lifecycle)#6
Conversation
| ## Endpoint | ||
|
|
||
| ``` | ||
| GET https://access.redhat.com/product-life-cycles/api/v1/products?name=<substring> |
There was a problem hiding this comment.
nit: there's now a v2 lifecycle API at https://access.redhat.com/product-life-cycles/api/v2/products . It has a slightly different schema, e.g.:
$ diff -u3 <(curl -s 'https://access.redhat.com/product-life-cycles/api/v1/products?name=OpenShift+Container+Platform' | jq -S .) <(curl -s 'https://access.redhat.com/product-life-cycles/api/v2/products?name=OpenShift+Container+Platform' | jq -S .) | head -n30
--- /dev/fd/63 2026-04-21 13:36:54.495565201 -0700
+++ /dev/fd/62 2026-04-21 13:36:54.495565201 -0700
@@ -61,27 +61,23 @@
"is_retired": false,
"link": "https://access.redhat.com/support/policy/updates/openshift/",
"name": "Red Hat OpenShift Container Platform",
+ "opl_uuid": null,
"package": null,
"policies": "https://access.redhat.com/site/support/policy/updates/openshift/policies/",
"release_cadence": "4 months ",
"show_final_minor_release": false,
- "show_last_minor_release": false,
"show_openshift_compatibility": false,
"uuid": "9bbc4758-50e0-4b73-89dc-2bae80f1d394",
"versions": [
{
"additional_text": "",
"extra_dependences": [],
- "extra_header_value": null,
"final_minor_release": null,
- "last_minor_release": null,
"name": "4.21",
"openshift_compatibility": null,
"phases": [
{
"additional_text": "",
- "date": "2026-02-03T00:00:00.000Z",
- "date_format": "date",
"end_date": "2026-02-03T00:00:00.000Z",
"end_date_format": "date",We probably want the skill to use the v2 API, but I'm agnostic about whether we port from v1 to v2 in this pull or, after merging this as it stands, in follow-up pulls.
There was a problem hiding this comment.
Thanks, we will introduce v2 usage in #13
| |---|---| | ||
| | `"Full Support"` | Active development, bug fixes, security patches | | ||
| | `"Maintenance Support"` | Critical/security fixes only, no new features | | ||
| | `"End of life"` | No fixes, no support — must upgrade | |
There was a problem hiding this comment.
$ curl -s https://access.redhat.com/product-life-cycles/api/v1/products | jq -r '.data[].versions[].type' | sort | uniq -c | sort -n
12
39 End of Maintenance
119 Extended Support
156 Maintenance Support
257 Full Support
875 End of lifeI'm not clear on what End of Maintenance means. Extended Support is similar to Maintenance Support, but might come with additional restrictions like the need to purchase add-ons. Would be nice if there were OpenAPI schemas for this endpoint, and maybe there are, and I'm just not aware of them.
There was a problem hiding this comment.
Good catch. I've updated the table to include all six type values the API actually returns: Full Support, Maintenance Support, End of Maintenance, Extended Support, End of life, and empty string. Verified against the live API.
Agreed on the OpenAPI schemas, I'm not aware of any either.
|
|
||
| | Field | Type | Description | | ||
| |---|---|---| | ||
| | `name` | string | Phase name (e.g., `"General availability"`, `"Full support"`, `"Maintenance support"`) | |
There was a problem hiding this comment.
$ curl -s https://access.redhat.com/product-life-cycles/api/v1/products | jq -r '.data[].versions[].phases[].name' | sort | uniq -c | sort -n
3 Migration support
5 Retired
18 Extended life cycle support (ELS) Term 3 add-on
18 Third-party certification period
26 Maintenance support 2
29 Extended life cycle support (ELS) Term 2 add-on
36 Maintenance Support 1
38 Extended life cycle support (ELS) 2
75 Extended life cycle support (ELS) add-on
85 Extended life cycle support (ELS) 1
125 Extended life phase
298 Extended update support Term 3
331 End of Life
453 Extended update support Term 2
583 Extended update support
1161 Maintenance support
1289 Full support
1458 General availabilityMaybe the API needs a link from each phase to docs about what that phase means for that product? Because that seems like a lot of phases that aren't all that clear to me as someone not terribly familiar with a bunch of these products.
There was a problem hiding this comment.
agree, I will update that file. Thanks.
| ## Search Tips | ||
|
|
||
| 1. **Be specific with `?name=`** — `"logging+for+Red+Hat+OpenShift"` is better than `"logging"` | ||
| 2. **Try former names** — if `"OpenShift Logging"` returns nothing, the product may have been renamed |
There was a problem hiding this comment.
and former_names didn't get populated? Seems like this would be a server-side bug, and I dunno of we want to try and coach the clients around it.
|
/retitle OTA-1963: WIP: skills: Add cluster-update skills (update-advisor, product-lifecycle) |
|
@harche: This pull request references OTA-1963 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| @@ -0,0 +1,222 @@ | |||
| --- | |||
| name: product-lifecycle | |||
| description: Query Red Hat Product Life Cycle data (PLCC) for support phases, end-of-life dates, and OpenShift version compatibility. Use when evaluating whether installed operators or layered products are supported on a given OCP version, approaching end of life, or need upgrading before a cluster upgrade. Also use when the user asks about product support status, EOL dates, or lifecycle phases for any Red Hat product. | |||
There was a problem hiding this comment.
Nit, I dug around a bit, and seems like PLCC used to stand for "Product Life Cycle Checker". Not sure that acronym is all that useful for AI though. Maybe just use PLC? Or drop the acronym entirely?
| --- | ||
| name: product-lifecycle | ||
| description: Query Red Hat Product Life Cycle data (PLCC) for support phases, end-of-life dates, and OpenShift version compatibility. Use when evaluating whether installed operators or layered products are supported on a given OCP version, approaching end of life, or need upgrading before a cluster upgrade. Also use when the user asks about product support status, EOL dates, or lifecycle phases for any Red Hat product. | ||
| allowed-tools: Bash(curl:*) |
There was a problem hiding this comment.
Spec:
h2.
allowed-toolsfield
The optionalallowed-toolsfield:
- A space-separated string of tools that are pre-approved to run
- Experimental. Support for this field may vary between agent implementations
Example: allowed-tools: Bash(git:*) Bash(jq:*) Read
That seems... not all that useful (e.g. what is the significance of the :*?). What does Read mean? The Experimental warning suggests they're not all that confident in being able to cover what the various implementations might choose to do with this anyway. Dropping to Claude docs:
h2. Pre-approve tools for a skill
Theallowed-toolsfield grants permission for the listed tools while the skill is active, so Claude can use them without prompting you for approval. It does not restrict which tools are available: every tool remains callable, and your permission settings still govern tools that are not listed.
This skill lets Claude run git commands without per-use approval whenever you invoke it:name: commit description: Stage and commit the current changes disable-model-invocation: true allowed-tools: Bash(git add *) Bash(git commit *) Bash(git status *)To block a skill from using certain tools, add deny rules in your permission settings instead.
And the over here:
The space before
*matters:Bash(ls *)matchesls -labut notlsof, whileBash(ls*)matches both. The:*suffix is an equivalent way to write a trailing wildcard, soBash(ls:*)matches the same commands asBash(ls *).
But that might all be Claude-specific, because I don't see any mention of allowed-tools in the OpenAI skills docs? And does it matter for our planned Lightspeed integration? Who would the Lightspeed harness be asking to approve tool use?
There was a problem hiding this comment.
ok, I will drop allowed-tools, thanks.
| ```bash | ||
| # Search for a product by name (substring match) | ||
| curl -s "https://access.redhat.com/product-life-cycles/api/v1/products?name=logging+for+Red+Hat+OpenShift" \ | ||
| | python3 -m json.tool |
There was a problem hiding this comment.
What tooling will Lightspeed's evaluation engine have available? I personally use jq instead of Python for processing responses like this, but I can easily imagine both systems with only jq, systems with only Python, systems with both, and systems with neither. Or is the AI engine smart enough to look at available tools and map from Python skill suggestions to jq if only jq is available?
There was a problem hiding this comment.
I agree, although python is heavily used by agents for code mode operations, in this case jq makes more sense, I will update those examples to use jq. Thanks.
There was a problem hiding this comment.
Resolved by 23c5759 -> 804b96b moving from python3 to jq; thanks 👍 There's still some ambiguity for me (also here and here) on what kind of contract agent images are trying to meet, so we can write skills that target that environment. But we don't have to hold this GitHub thread open to sort that out.
|
|
||
| ## Response Structure | ||
|
|
||
| Each product in `data[]` has: |
There was a problem hiding this comment.
Seems like it overlaps with api-details.md? Is the goal here just to cover some of the highlights of the response format, without swamping with context? Maybe we should suggest using a sub-agent that has read the whole api-details.md file, to try to preserve that context capacity, at the cost of restricting use to agents that can launch sub-agents? Or offload PLC queries to an MCP tool, instead of using a Skill? Or...?
There was a problem hiding this comment.
Thanks, you're right. There was genuine duplication here and it was already causing drift (the type field table in SKILL.md was stale). I've removed the duplicated Key fields and Phase date formats sections from SKILL.md, keeping just the JSON example and a pointer to api-details.md.
The intent is progressive disclosure. I want to give the agent just enough to work with for typical queries without loading the full schema into context. If the agent needs the full field details, type enumerations, or phase name breakdown, it can read api-details.md on demand.
| --- | ||
| name: openshift-cluster-update-advisor | ||
| description: Assess OpenShift cluster update readiness and risk. Use when evaluating whether a cluster is safe to update, when an update is available, or when the user asks about update risks, prerequisites, blockers, or best practices. | ||
| name: ota-upgrade-advisor |
There was a problem hiding this comment.
nit: "ota" is part of the Red-Hat-internal team name (Over the Air Updates), but personally, I don't see anything all that over-the-air about cluster updates, and I'd suggest focusing on the subject (an OpenShift cluster), action (update), and what the skill is bringing to the that subject+action event (advice).
There was a problem hiding this comment.
ok, thanks. I will change it to Cluster Update Advisor
| name: openshift-cluster-update-advisor | ||
| description: Assess OpenShift cluster update readiness and risk. Use when evaluating whether a cluster is safe to update, when an update is available, or when the user asks about update risks, prerequisites, blockers, or best practices. | ||
| name: ota-upgrade-advisor | ||
| description: Assess OpenShift cluster upgrade readiness and risk. Use when evaluating whether a cluster is safe to upgrade, when an upgrade is available, or when the user asks about upgrade risks, prerequisites, blockers, or best practices. |
There was a problem hiding this comment.
nit: some projects make distinctions between "upgrade" and "update" (e.g. using one for patch fixes with another for feature bumps). OpenShift does not make that distinction. We try to consolidate on "update" (e.g. in openshift/openshift-docs#43138) with the hope that folks seeing "update" consistently used will not get distracted by wondering if there's an upgrade-vs.-update distinction. Not clear to me if that actually works out for us, or just leads to more discussions trying to herd folks towards "update" 😅 . But personally I'd stick to update as much as possible, and throw in one "upgrade" refrence early on just to help searching robots string-match regardless of which word the requestor had used.
| The proposal request contains: | ||
| - Current and target version metadata | ||
| - Channel and update path information | ||
| - **Cluster readiness JSON** — pre-collected by CVO with results from 9 parallel checks |
There was a problem hiding this comment.
nit: we probably don't want to commit to a specific number of checks, since that can evolve independently in the CVO repository. And the fact that the checks are parallel is intersting for CVO latency/efficiency reasons, but not all that relevent for the consumer, and we probably don't need to cover that here. Also, the fact that it could be the CVO collecting these checks doesn't matter for the skill either, other folks besides the CVO could be mounting this skill into Lightspeed Proposal structures or using it to generate update advice? Maybe something like:
| - **Cluster readiness JSON** — pre-collected by CVO with results from 9 parallel checks | |
| - Cluster readiness JSON: additional cluster health checks with context that may be relevant to preparing for any update, or context that as particularly revant to a selected target version. |
| "total_checks": 9, | ||
| "checks_ok": 9, | ||
| "checks_errored": 0, | ||
| "elapsed_seconds": 0.65 |
There was a problem hiding this comment.
nit: This metadata gives room for inconsistent statements, e.g.:
{
"current_version": "4.21.5",
"target_version": "4.21.8",
"checks": {},
"meta": {
"total_checks": 1,
"checks_ok": 2,
"checks_errored": 3,
"elapsed_seconds": 0.65
}
}claims more checks than exist under checks, and even more ok and errored than the total. If the client is parsing the JSON anyway, it's less parsing, and less room for inconsistency, if we just send checks and don't attempt to summarize in meta.
| } | ||
| ``` | ||
|
|
||
| Each check contains `_status` (`ok` or `error`), `_elapsed_seconds`, and |
There was a problem hiding this comment.
nit: you don't have _elapsed_seconds in the checks[] example above. I'm not clear on what the client would do with that information anyway, maybe just report a status boolean-ish and summary string?
| Each check contains `_status` (`ok` or `error`), `_elapsed_seconds`, and | ||
| check-specific data with a `summary` section for quick parsing. | ||
|
|
||
| ### What the checks cover |
There was a problem hiding this comment.
nit: do we need to inline these here? It gives us two places (CVO repo and here) that need bumping if the checks evolve. Ideally the checks explain themselves in the JSON, and we don't need to support that JSON with skill-side details for each of the current checks.
| approvers: | ||
| - harche | ||
| - mrunalp | ||
| - wking |
There was a problem hiding this comment.
I'm not clear enough on the maintenance plan for "skills for all OCP" to feel all that optimistic at succeeding as a repo-level approver. I'm happy to give it a go, but I'm curious about how folks see this kind of maintenance working out long-term as bugs come in or component teams ask for their own subdirectories or CI/eval frameworks get developed and maintained to stay happy/useful.
|
|
||
| approvers: | ||
| - harche | ||
| - mrunalp |
There was a problem hiding this comment.
Y'all don't need to sign up to maintain cluster-update skills, because you're already root-level repo approvers, and that access means you're able to approve anything here, regardless of whether you're listed explicitly in the child-dir OWNERS file.
|
|
||
| This directory contains skills which are designed to help agents with ClusterVersion activities such as preparing for cluster updates. | ||
|
|
||
| > **Note:** The skills in this directory are initial drafts. They will evolve as we test and refine them based on real-world usage. |
There was a problem hiding this comment.
This is true everywhere, all the time, right? I don't think we need to call out that nothing is ever perfect and everything can and will evolve. I could see us calling out specific things like "This whole repo is tech-preview and should only be referenced if you're the cluster-version operator" or whatever ground rules we want to set on consumption. Maybe in LABEL calls in the Containerfile, so it's easible visible even without having to drop into image layers? But I don't think we get a lot of value with a generic warning down inside a subdir README.
| specific findings: | ||
|
|
||
| - **`prometheus`** — if `etcd_health` shows degraded conditions, query | ||
| `etcd_disk_backend_commit_duration_seconds` for trends |
There was a problem hiding this comment.
This sets up some compatibility commitments with longer reach. Do we need to encode those here? Or can the suggested-next-step information go into the CVO's rendered check output clearly enough that the consuming agent can figure out how to accomplish that step without having to be supported with additional contenxt from this skill's Markdown?
There was a problem hiding this comment.
ok, we can do that from CVO's output too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
23c5759 to
804b96b
Compare
| │ | ||
| Step 4: Classify and decide | ||
| Assign each finding a severity per the classification table | ||
| in section 4.2. Then determine the overall assessment: |
There was a problem hiding this comment.
With 23c5759 -> 804b96b, it's ### 3.2 Blocker Classification, no longer 4.2. Do we need to number sections? That seems brittle, as we see with this most recent reroll. How about "in the 'Blocker Classification' section"? Or, because I don't really understand the reason for this triple-backticked workflow, moving this whole thing to an enumerated list, and using a Markdown link to the section we are referencing?
## Decision policy
### Workflow
1. Parse readiness data ...
1. Verify data completeness...
1. Evaluate findings...
1. Classify and decide. Assign each finding a severity per [the classification table](#blocker-classification). Then determine...
### Blocker classification
...|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: harche, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
No consumers yet, and we'll have evals set up before we ship anything in the product. /verified by @wking |
|
@wking: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override ci/prow/eval |
|
@harche: Overrode contexts on behalf of harche: ci/prow/eval DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@harche: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Replaces inline curl/python one-liners in SKILL.md with a standalone Python CLI (plc_lookup.py) that wraps the Red Hat Product Life Cycle API v2. This addresses review feedback from PR openshift#6: - Migrate from v1 to v2 API (wking review comment) - Handle all 5 support status types, not just 3 (wking review comment) - Drop PLCC acronym in favor of "Product Life Cycle" (wking review comment) - Add proper error handling for network/API failures - Add pagination (--limit, --offset) for broad queries - Add 46 tests (unit + live API integration) CLI commands: products <name> [--ocp <ver>] [--limit N] [--offset N] olm-check --ocp <ver> --operators '<json>' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
cluster-update/update-advisor/— OTA upgrade advisor skill ported fromcluster-version-operator/lightspeed/skills/ota-upgrade-advisor/cluster-update/product-lifecycle/— Red Hat PLCC lifecycle query skillcluster-update/OWNERSandcluster-update/README.mdReviewers
These skills relate to the Cluster Version Operator / OTA domain. Tagging approvers familiar with upgrade advisor workflows.
Test plan
update-advisorSKILL.md frontmatter is valid (name, description, allowed-tools)openshift-docs,jira,product-lifecycleskill names are correctproduct-lifecycleSKILL.md PLCC API endpoint is reachable🤖 Generated with Claude Code