Skip to content

Commit 53e6ccb

Browse files
DavidRajnohaclaude
andcommitted
fix(tests): revert to single-phase polling with 3-min interval
The 2-phase approach (Thanos API check + UI traversal) failed because the oc get --raw query to Thanos Querier doesn't return alert data as expected in the CI environment. Revert to the original single-phase UI traversal but with a 3-minute interval instead of 1-minute. This reduces the number of heavy findIncidentWithAlert iterations from ~30 to ~10, keeping the Cypress command log small enough to avoid Chrome OOM (exit code 137). CI run: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/860/pull-ci-openshift-monitoring-plugin-main-e2e-incidents/2038898485348012032 Classifications: TEST_BUG (Phase 1 Thanos API query not working in CI) 22/23 tests passed — no OOM confirmed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent aae2094 commit 53e6ccb

1 file changed

Lines changed: 9 additions & 27 deletions

File tree

web/cypress/e2e/incidents/00.coo_incidents_e2e.cy.ts

Lines changed: 9 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -42,36 +42,18 @@ describe('BVT: Incidents - e2e', { tags: ['@smoke', '@slow', '@incidents', '@e2e
4242
incidentsPage.goTo();
4343
incidentsPage.clearAllFilters();
4444

45-
const intervalMs = 60_000;
46-
const maxMinutes = 30;
47-
48-
cy.log('1.2 Wait for alert to start firing on cluster');
49-
// Phase 1: Poll the Thanos Querier API to check if the alert is actually
50-
// firing. This is lightweight — a single cy.exec per iteration with no
51-
// Chrome DOM interaction, preventing the OOM (exit code 137) caused by
52-
// repeated heavy UI traversals accumulating Cypress command log snapshots.
53-
const kubeconfigPath = Cypress.env('KUBECONFIG_PATH');
54-
cy.waitUntil(
55-
() => cy.exec(
56-
`oc get --raw '/api/v1/namespaces/openshift-monitoring/services/thanos-querier:web/proxy/api/v1/rules?type=alert' --kubeconfig ${kubeconfigPath}`,
57-
{ failOnNonZeroExit: false, timeout: 20000 },
58-
).then((result) => result.code === 0 && result.stdout.includes(currentAlertName)),
59-
{
60-
interval: 30_000,
61-
timeout: 20 * 60_000,
62-
errorMsg: `Alert ${currentAlertName} not firing on cluster within 20 minutes`,
63-
}
64-
);
65-
66-
cy.log('1.2.1 Wait for incident detection to pick up the firing alert');
67-
// Phase 2: Alert is confirmed firing. Wait for incident detection to group
68-
// it into an incident. Uses the UI traversal but with fewer iterations
69-
// since incident detection typically takes 5-10 minutes after alert fires.
45+
cy.log('1.2 Wait for incident with custom alert to appear');
46+
// Use a 3-minute interval instead of 1-minute to reduce the number of
47+
// heavy UI traversals. Each findIncidentWithAlert call generates hundreds
48+
// of Cypress commands (clicking bars, expanding rows, checking text).
49+
// With 1-min interval the command log grew unbounded over 30 iterations
50+
// causing Chrome OOM (exit code 137). With 3-min interval we get ~10
51+
// iterations max, keeping memory within container limits.
7052
cy.waitUntil(
7153
() => incidentsPage.findIncidentWithAlert(currentAlertName),
7254
{
73-
interval: 2 * intervalMs,
74-
timeout: 15 * intervalMs,
55+
interval: 3 * 60_000,
56+
timeout: 30 * 60_000,
7557
}
7658
);
7759

0 commit comments

Comments
 (0)