Script: .ci/oci-devworkspace-happy-path.sh
Purpose: Integration test validating DevWorkspace Operator with Eclipse Che deployment
This script deploys and validates the full DevWorkspace Operator + Eclipse Che stack on OpenShift, ensuring the happy-path user workflow succeeds. It's used in the v14-che-happy-path Prow CI test.
- Che deployment: 2 attempts with exponential backoff (60s base + jitter)
- Cleanup: Waits for CheCluster CR deletion before retry
- Happy-path test retry: 1 retry with 30s delay if Selenium test fails
- OLM: Verifies
catalog-operatorandolm-operatorare available before Che deployment (2-minute timeout each) - DWO: Waits for
deployment condition=available(5-minute timeout) - Che: chectl's built-in readiness checks ensure deployment is healthy
On each failure, collects:
- OLM diagnostics (Subscription, InstallPlan, CSV, CatalogSource)
- CatalogSource pod logs
- Che operator logs (last 1000 lines)
- CheCluster CR status (full YAML)
- All pod logs from Che namespace
- Kubernetes events
- chectl server logs
- Graceful error handling with stage-specific messages
- Progress indicators: "Attempt 1/2", "Retrying in 71s..."
- No crash on failures
Environment variables (all optional):
| Variable | Default | Description |
|---|---|---|
CHE_NAMESPACE |
eclipse-che |
Namespace for Che deployment |
MAX_RETRIES |
2 |
Maximum retry attempts |
BASE_DELAY |
60 |
Base delay in seconds for exponential backoff |
MAX_JITTER |
15 |
Maximum jitter in seconds |
ARTIFACT_DIR |
/tmp/dwo-e2e-artifacts |
Directory for diagnostic artifacts |
DEVWORKSPACE_OPERATOR |
(required) | DWO image to deploy |
The script is called automatically by the v14-che-happy-path Prow job. Prow sets DEVWORKSPACE_OPERATOR based on the context:
For PR checks (testing PR code):
export DEVWORKSPACE_OPERATOR="quay.io/devfile/devworkspace-controller:pr-${PR_NUMBER}-${COMMIT_SHA}"
./.ci/oci-devworkspace-happy-path.shFor periodic/nightly runs (testing main branch):
export DEVWORKSPACE_OPERATOR="quay.io/devfile/devworkspace-controller:next"
./.ci/oci-devworkspace-happy-path.shexport DEVWORKSPACE_OPERATOR="quay.io/youruser/devworkspace-controller:your-tag"
export ARTIFACT_DIR="/tmp/my-test-artifacts"
./.ci/oci-devworkspace-happy-path.sh-
Deploy DWO
- Runs
make install - Waits for controller deployment to be available
- Collects artifacts if deployment fails
- Runs
-
Deploy Che (with retry)
- Runs
chectl server:deploywith extended timeouts (24h) - chectl handles readiness checks internally
- Collects artifacts on failure
- Cleans up and retries if needed
- Runs
-
Run Happy-Path Test
- Downloads test script from Eclipse Che repository
- Executes Che happy-path workflow
- Retries once after 30s if test fails
- Collects artifacts on failure
0: Success - All stages completed1: Failure - Check$ARTIFACT_DIRfor diagnostics
| Component | Timeout | Purpose |
|---|---|---|
| DWO deployment | 5 minutes | Pod becomes available |
| chectl pod wait/ready | 24 hours | Generous for slow environments |
Symptoms: "ERROR: OLM infrastructure is not healthy, cannot proceed with Che deployment"
Check: $ARTIFACT_DIR/olm-diagnostics-olm-check.yaml
Common causes:
- OLM operators not running (
catalog-operator,olm-operator) - Cluster provisioning issues during bootstrap
- Resource constraints preventing OLM operator scheduling Resolution: This indicates a fundamental cluster infrastructure issue. Check cluster health and OLM operator logs before retrying.
Symptoms: "ERROR: DWO controller is not ready"
Check: $ARTIFACT_DIR/devworkspace-controller-info/
Common causes: Image pull errors, resource constraints, webhook conflicts
Symptoms: "ERROR: chectl server:deploy failed" with timeout-related messages
Check: $ARTIFACT_DIR/che-operator-logs-attempt-*.log, $ARTIFACT_DIR/olm-diagnostics-attempt-*.yaml, $ARTIFACT_DIR/chectl-logs-attempt-*/
Common causes:
- OLM subscription timeout (check
olm-diagnosticsfor subscription state) - Database connection issues
- Image pull failures
- Operator reconciliation errors
- chectl timeout waiting for pods/resources to become ready
Symptoms: "ERROR: chectl server:deploy failed"
Check: $ARTIFACT_DIR/eclipse-che-info/ for pod logs
Common causes: Configuration errors, resource limits, TLS certificate issues
Symptoms: Subscription timeout after 120 seconds with no resources created
Check: $ARTIFACT_DIR/olm-diagnostics-attempt-*.yaml, $ARTIFACT_DIR/catalogsource-logs-attempt-*.log
Common causes:
- CatalogSource pod not pulling/running
- InstallPlan not created (subscription cannot resolve dependencies)
- Cluster resource exhaustion preventing operator pod scheduling Resolution: Check OLM operator logs and CatalogSource pod status. See "Advanced Troubleshooting" section for monitoring and alternative deployment options.
After a failed test run:
$ARTIFACT_DIR/
├── attempt-log.txt
├── failure-report.json
├── failure-report.md
├── devworkspace-controller-info/
│ ├── <pod-name>-<container>.log
│ └── events.log
├── eclipse-che-info/
│ ├── <pod-name>-<container>.log
│ └── events.log
├── che-operator-logs-attempt-1.log
├── che-operator-logs-attempt-2.log
├── checluster-status-attempt-1.yaml
├── checluster-status-attempt-2.yaml
├── olm-diagnostics-attempt-1.yaml
├── olm-diagnostics-attempt-2.yaml
├── catalogsource-logs-attempt-1.log
├── catalogsource-logs-attempt-2.log
├── chectl-logs-attempt-1/
└── chectl-logs-attempt-2/
kubectl- Kubernetes CLIoc- OpenShift CLI (for log collection)chectl- Eclipse Che CLI (v7.114.0+)jq- JSON processor (for chectl)
If you experience persistent OLM subscription timeouts (see olm-diagnostics-*.yaml artifacts):
The script now verifies OLM infrastructure health before deploying Che:
- Checks
catalog-operatoris available - Checks
olm-operatoris available - Verifies
openshift-marketplaceis accessible
If OLM is unhealthy, the test fails fast with diagnostic artifacts instead of waiting through timeouts.
For debugging stuck subscriptions, you can add active monitoring to detect zero-progress scenarios earlier:
# Example: Monitor subscription state every 10 seconds
while [ $elapsed -lt 300 ]; do
state=$(kubectl get subscription eclipse-che -n eclipse-che \
-o jsonpath='{.status.state}' 2>/dev/null)
echo "[$elapsed/300s] Subscription state: ${state:-unknown}"
if [ "$state" = "AtLatestKnown" ]; then
break
fi
sleep 10
elapsed=$((elapsed + 10))
doneThis helps identify whether subscriptions are progressing slowly vs. completely stuck.
For CI environments with persistent OLM issues, consider deploying Che operator directly instead of via OLM:
chectl server:deploy \
--installer=operator \ # Uses direct YAML deployment
-p openshift \
--batch \
--telemetry=off \
--skip-devworkspace-operator \
--chenamespace="$CHE_NAMESPACE"Trade-offs:
- ✅ Bypasses OLM infrastructure entirely
- ✅ More reliable in resource-constrained CI environments
- ❌ Doesn't test OLM integration path (used by production OperatorHub)
- ❌ May miss OLM-specific issues
When to use: Temporary workaround for CI infrastructure issues while OLM problems are being resolved.
If OLM subscriptions consistently timeout (visible in olm-diagnostics-*.yaml):
-
Check OLM operator logs:
kubectl logs -n openshift-operator-lifecycle-manager \ deployment/catalog-operator --tail=100 kubectl logs -n openshift-operator-lifecycle-manager \ deployment/olm-operator --tail=100
-
Verify CatalogSource pod is running:
kubectl get pods -n openshift-marketplace \ -l olm.catalogSource=eclipse-che kubectl logs -n openshift-marketplace \ -l olm.catalogSource=eclipse-che
-
Check InstallPlan creation:
kubectl get installplan -n eclipse-che -o yaml
- If no InstallPlan exists, OLM couldn't resolve the subscription
- If InstallPlan exists but isn't complete, check its status conditions
The script automatically generates failure reports and posts them as PR comments after each run (both failures and successes with retries). Do not delete these comments — they are used to track flakiness patterns across PRs.
Each report includes a table of all attempts with:
- Attempt: Which attempt number (e.g.,
1/2,2/2) - Stage: Which function failed (
deployChe,runHappyPathTest, etc.) - Result:
PASSEDorFAILED - Reason: Classified failure reason (e.g., "Che operator reconciliation failure")
| Category | Meaning | Retryable? |
|---|---|---|
INFRA |
Infrastructure issue (OLM, image pull, operator reconciliation) | Yes — /retest |
TEST |
Test execution issue (Dashboard UI timeout, workspace start) | Maybe |
MIXED |
Both infrastructure and test issues across attempts | Yes — /retest |
UNKNOWN |
Could not classify — check artifacts | Investigate |
Reports are always saved to $ARTIFACT_DIR/ regardless of whether PR commenting succeeds:
failure-report.json— structured data for programmatic analysisfailure-report.md— human-readable markdown (same as the PR comment)attempt-log.txt— raw attempt tracking log
Over time, these reports reveal:
- Which failure categories are most common
- Whether flakiness is improving or worsening
- Which infrastructure components are least reliable
- Whether retry logic is effective (passed-on-retry patterns)