@@ -10,15 +10,14 @@ This script deploys and validates the full DevWorkspace Operator + Eclipse Che s
1010## Features
1111
1212### Retry Logic
13- - ** Max retries ** : 2 (3 total attempts )
14- - ** Exponential backoff ** : 60s base delay with 0-15s jitter
15- - ** Cleanup ** : Deletes failed Che deployment before retry
13+ - ** Che deployment ** : 2 attempts with exponential backoff (60s base + jitter )
14+ - ** Cleanup ** : Waits for CheCluster CR deletion before retry
15+ - ** Happy-path test retry ** : 1 retry with 30s delay if Selenium test fails
1616
1717### Health Checks
1818- ** OLM** : Verifies ` catalog-operator ` and ` olm-operator ` are available before Che deployment (2-minute timeout each)
1919- ** DWO** : Waits for ` deployment condition=available ` (5-minute timeout)
20- - ** Che** : Waits for ` CheCluster condition=Available ` (10-minute timeout)
21- - ** Pods** : Verifies all Che pods are ready
20+ - ** Che** : chectl's built-in readiness checks ensure deployment is healthy
2221
2322### Artifact Collection
2423On each failure, collects:
@@ -82,14 +81,14 @@ export ARTIFACT_DIR="/tmp/my-test-artifacts"
8281
83822 . ** Deploy Che** (with retry)
8483 - Runs ` chectl server:deploy ` with extended timeouts (24h)
85- - Waits for CheCluster condition=Available
86- - Verifies all pods are ready
84+ - chectl handles readiness checks internally
8785 - Collects artifacts on failure
8886 - Cleans up and retries if needed
8987
90883 . ** Run Happy-Path Test**
9189 - Downloads test script from Eclipse Che repository
9290 - Executes Che happy-path workflow
91+ - Retries once after 30s if test fails
9392 - Collects artifacts on failure
9493
9594## Exit Codes
@@ -102,8 +101,6 @@ export ARTIFACT_DIR="/tmp/my-test-artifacts"
102101| Component | Timeout | Purpose |
103102| -----------| ---------| ---------|
104103| DWO deployment | 5 minutes | Pod becomes available |
105- | CheCluster Available | 10 minutes | Che fully deployed |
106- | Che pods ready | 5 minutes | All pods running |
107104| chectl pod wait/ready | 24 hours | Generous for slow environments |
108105
109106## Common Failures
@@ -123,13 +120,14 @@ export ARTIFACT_DIR="/tmp/my-test-artifacts"
123120** Common causes** : Image pull errors, resource constraints, webhook conflicts
124121
125122### Che Deployment Timeout
126- ** Symptoms** : "ERROR: CheCluster did not become available within 10 minutes"
127- ** Check** : ` $ARTIFACT_DIR/che-operator-logs-attempt-*.log ` , ` $ARTIFACT_DIR/olm-diagnostics-attempt-*.yaml `
123+ ** Symptoms** : "ERROR: chectl server : deploy failed" with timeout-related messages
124+ ** Check** : ` $ARTIFACT_DIR/che-operator-logs-attempt-*.log ` , ` $ARTIFACT_DIR/olm-diagnostics-attempt-*.yaml ` , ` $ARTIFACT_DIR/chectl-logs-attempt-*/ `
128125** Common causes** :
129126- OLM subscription timeout (check ` olm-diagnostics ` for subscription state)
130127- Database connection issues
131128- Image pull failures
132129- Operator reconciliation errors
130+ - chectl timeout waiting for pods/resources to become ready
133131
134132### Pod CrashLoopBackOff
135133** Symptoms** : "ERROR: chectl server: deploy failed"
@@ -150,6 +148,9 @@ export ARTIFACT_DIR="/tmp/my-test-artifacts"
150148After a failed test run:
151149```
152150$ARTIFACT_DIR/
151+ ├── attempt-log.txt
152+ ├── failure-report.json
153+ ├── failure-report.md
153154├── devworkspace-controller-info/
154155│ ├── <pod-name>-<container>.log
155156│ └── events.log
@@ -256,6 +257,42 @@ If OLM subscriptions consistently timeout (visible in `olm-diagnostics-*.yaml`):
256257 - If no InstallPlan exists, OLM couldn't resolve the subscription
257258 - If InstallPlan exists but isn't complete, check its status conditions
258259
260+ ## CI Failure Reports
261+
262+ The script automatically generates failure reports and posts them as PR comments after each run (both failures and successes with retries). ** Do not delete these comments** — they are used to track flakiness patterns across PRs.
263+
264+ ### What gets reported
265+
266+ Each report includes a table of all attempts with:
267+ - ** Attempt** : Which attempt number (e.g., ` 1/2 ` , ` 2/2 ` )
268+ - ** Stage** : Which function failed (` deployChe ` , ` runHappyPathTest ` , etc.)
269+ - ** Result** : ` PASSED ` or ` FAILED `
270+ - ** Reason** : Classified failure reason (e.g., "Che operator reconciliation failure")
271+
272+ ### Failure categories
273+
274+ | Category | Meaning | Retryable? |
275+ | ----------| ---------| ------------|
276+ | ` INFRA ` | Infrastructure issue (OLM, image pull, operator reconciliation) | Yes — ` /retest ` |
277+ | ` TEST ` | Test execution issue (Dashboard UI timeout, workspace start) | Maybe |
278+ | ` MIXED ` | Both infrastructure and test issues across attempts | Yes — ` /retest ` |
279+ | ` UNKNOWN ` | Could not classify — check artifacts | Investigate |
280+
281+ ### Report artifacts
282+
283+ Reports are always saved to ` $ARTIFACT_DIR/ ` regardless of whether PR commenting succeeds:
284+ - ` failure-report.json ` — structured data for programmatic analysis
285+ - ` failure-report.md ` — human-readable markdown (same as the PR comment)
286+ - ` attempt-log.txt ` — raw attempt tracking log
287+
288+ ### Why these comments matter
289+
290+ Over time, these reports reveal:
291+ - Which failure categories are most common
292+ - Whether flakiness is improving or worsening
293+ - Which infrastructure components are least reliable
294+ - Whether retry logic is effective (passed-on-retry patterns)
295+
259296## Related Documentation
260297
261298- [ Eclipse Che Documentation] ( https://eclipse.dev/che/docs/ )
0 commit comments