Skip to content

Adding Integration Tests#21

Open
shekhar316 wants to merge 2 commits into
kruize:mainfrom
shekhar316:integrationTests
Open

Adding Integration Tests#21
shekhar316 wants to merge 2 commits into
kruize:mainfrom
shekhar316:integrationTests

Conversation

@shekhar316
Copy link
Copy Markdown
Contributor

@shekhar316 shekhar316 commented May 6, 2026

Summary by Sourcery

Add a self-contained end-to-end test framework for Kruize Optimizer, including deployment orchestration and comprehensive workflow and webhook tests.

New Features:

  • Introduce a Python-based E2E test runner that provisions clusters, deploys monitoring, workloads, Kruize, and optimizer, then executes pytest suites and generates HTML reports.
  • Add complete workflow integration tests that validate optimizer pod/service health, profile installation, bulk job execution, webhooks, and health endpoints via live APIs and logs.
  • Provide negative webhook test coverage to ensure robust error handling for malformed and invalid payloads against the optimizer webhook endpoint.

Enhancements:

  • Add reusable utilities for cluster interactions, deployments, API clients, and log parsing to support E2E testing across Kind and OpenShift clusters.

Documentation:

  • Document the E2E testing architecture, workflows, configuration, and usage instructions in a dedicated README for running tests locally and in CI.

Tests:

  • Add E2E pytest suites and supporting scripts/configuration to validate complete optimizer workflows, webhook behavior, and system health across different deployment modes and cluster types.

Chores:

  • Define Python requirements for E2E tests and add shared configuration files for test clusters and runtime settings.

shekhar316 added 2 commits May 6, 2026 13:40
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 6, 2026

Reviewer's Guide

Introduce a self-contained Python-based E2E testing framework for Kruize Optimizer, including an orchestrated deployment runner, cluster and log utilities, configuration, and new pytest suites for full workflow and webhook validation.

File-Level Changes

Change Details Files
Add a Python E2E test runner that orchestrates cluster setup, Prometheus and workload deployment, Kruize/optimizer deployment (operator or manifest mode), port-forwarding, pytest execution, and teardown.
  • Implement E2ETestRunner to load YAML config, drive DeploymentManager/ClusterManager, and run pytest with HTML reporting
  • Support multiple cluster types (kind, openshift, minikube) and deployment modes (operator vs manifest) with CLI arguments and config overrides
  • Manage lifecycle of Kind clusters, namespaces, benchmarks, and port-forward processes with optional skip-cleanup for debugging
tests/e2e/run_e2e_tests.py
tests/e2e/config/test_config.yaml
tests/e2e/config/kind-config.yaml
tests/e2e/requirements.txt
Provide deployment, cluster, API, and log utility layers to encapsulate Kubernetes operations and optimizer/Kruize interactions for reuse across tests.
  • Implement DeploymentManager to clone required repos, deploy Prometheus, apply kustomize overlays for operator/optimizer, deploy sparse-checkout sysbench benchmarks, label workloads, and set up port-forwards
  • Implement ClusterManager wrapper around kubectl/oc for namespace management, pod readiness, logs, manifests, kustomize, and port-forwarding
  • Implement KruizeAPIClient and OptimizerAPIClient for REST calls, health checks, profile listing, jobs overview, webhooks, and wait helpers
  • Implement log_utils for parsing optimizer logs, validating profile installation, bulk job/webhook markers, and extracting job completion metadata
tests/e2e/utils/deployment_manager.py
tests/e2e/utils/cluster_utils.py
tests/e2e/utils/kruize_utils.py
tests/e2e/utils/log_utils.py
tests/e2e/utils/__init__.py
Add a comprehensive end-to-end workflow pytest suite verifying optimizer deployment, profiles, workloads, bulk jobs, webhooks, logs, and health endpoints.
  • Create TestCompleteWorkflow class that uses fixtures for config, ClusterManager, and API clients to structure sequential workflow checks
  • Load configsReferenceIndex.json and cross-check metric/metadata profiles and layers against Kruize API responses
  • Verify optimizer and sysbench pods readiness, log-driven service start, profile installation logs, bulk jobs with autotune label, job completion, webhook counters, error scanning, and health endpoints
tests/e2e/tests/test_01_complete_workflow.py
Introduce webhook-focused negative/positive pytest suite and a standalone probe script for inspecting webhook responses.
  • Add TestWebhookNegativeScenarios to POST various malformed/edge-case payloads (invalid JSON, null/empty payloads, missing/malformed fields, invalid job IDs, missing headers) and assert appropriate error codes
  • Include a positive control test for valid payload acceptance and a mixed-validity multi-payload case
  • Provide test_webhook_responses.py script to manually send canned payloads to a configurable BASE_URL and print raw responses for debugging
tests/e2e/tests/test_04_webhook.py
tests/e2e/test_webhook_responses.py
Document and structure the E2E framework layout, usage patterns, and CI integration guidance.
  • Add README detailing architecture, directory layout, prerequisites, configuration, supported cluster types/modes, workflow phases, debugging tips, and sample CI (GitHub Actions) integration
  • Ensure tests/e2e is self-contained with repo-clone directory (.repos), configuration, and documented naming conventions for future suites
tests/e2e/README.md
tests/e2e/tests/__init__.py

Possibly linked issues

  • ## Add a e2e test for Optimizer: PR implements a comprehensive E2E testing framework and multiple Optimizer tests, satisfying and exceeding the requested e2e test.
  • #(unassigned): The PR implements actual-cluster E2E tests, including detailed negative webhook scenarios explicitly requested in the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@shekhar316 shekhar316 marked this pull request as ready for review May 11, 2026 08:19
@shekhar316 shekhar316 requested a review from chandrams May 11, 2026 08:20
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The test_webhook_multiple_payloads_one_invalid test in test_04_webhook.py appears truncated and never asserts on the response or logs a result, so it should be completed to validate the expected 400 and avoid a silently passing/no-op test.
  • In run_e2e_tests.py, configuration keys are used inconsistently (e.g., self.config.get('kruize_port', ...) vs config['api']['kruize_port'], and kind_cluster_name in cleanup vs cluster.name in setup); aligning these to a single canonical structure under cluster/api will prevent mismatched ports or orphaned clusters during teardown.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `test_webhook_multiple_payloads_one_invalid` test in `test_04_webhook.py` appears truncated and never asserts on the response or logs a result, so it should be completed to validate the expected 400 and avoid a silently passing/no-op test.
- In `run_e2e_tests.py`, configuration keys are used inconsistently (e.g., `self.config.get('kruize_port', ...)` vs `config['api']['kruize_port']`, and `kind_cluster_name` in cleanup vs `cluster.name` in setup); aligning these to a single canonical structure under `cluster`/`api` will prevent mismatched ports or orphaned clusters during teardown.

## Individual Comments

### Comment 1
<location path="tests/e2e/tests/test_01_complete_workflow.py" line_range="356-365" />
<code_context>
+    def test_08_webhook_callback_received(self, optimizer_client, config):
</code_context>
<issue_to_address>
**issue (testing):** Webhook callback test does not fail when the expected callback is never observed, which makes the test misleading relative to its docstring.

The test’s docstring expects experiment counters to be updated, but if `totalExperimentsProcessed` never increases within the timeout, the test only logs warnings and still passes. This can mask webhook regressions. Please either fail when `webhook_received` stays `False` (or otherwise unmet preconditions are detected), or clearly mark this as a non-strict check and update the expectations in the test’s documentation accordingly.
</issue_to_address>

### Comment 2
<location path="tests/e2e/tests/test_04_webhook.py" line_range="261-50" />
<code_context>
+    def test_webhook_multiple_payloads_one_invalid(self, optimizer_client):
</code_context>
<issue_to_address>
**issue (testing):** The `test_webhook_multiple_payloads_one_invalid` test does not assert on the response, so it never actually verifies behavior.

This test currently sends one valid and one invalid payload but never checks the result, so it will pass no matter what the server does. Please add an assertion for the expected status code (e.g. `assert response.status_code == 400` as suggested by the docstring) and, if relevant, key fields in the response body to verify the mixed-payload behavior.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +356 to +365
def test_08_webhook_callback_received(self, optimizer_client, config):
"""
Test: Verify webhook callback is received
Expected: Experiment counters are updated
"""
logger.info("Test: Verify webhook callback")

# Get current state
jobs_overview = optimizer_client.get_jobs_overview()
total_experiments = jobs_overview.get('totalExperiments', 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Webhook callback test does not fail when the expected callback is never observed, which makes the test misleading relative to its docstring.

The test’s docstring expects experiment counters to be updated, but if totalExperimentsProcessed never increases within the timeout, the test only logs warnings and still passes. This can mask webhook regressions. Please either fail when webhook_received stays False (or otherwise unmet preconditions are detected), or clearly mark this as a non-strict check and update the expectations in the test’s documentation accordingly.

"""
logger.info("Test: Webhook with invalid JSON")

response = requests.post(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): The test_webhook_multiple_payloads_one_invalid test does not assert on the response, so it never actually verifies behavior.

This test currently sends one valid and one invalid payload but never checks the result, so it will pass no matter what the server does. Please add an assertion for the expected status code (e.g. assert response.status_code == 400 as suggested by the docstring) and, if relevant, key fields in the response body to verify the mixed-payload behavior.

@chandrams chandrams requested a review from shreyabiradar07 May 13, 2026 06:47
@chandrams
Copy link
Copy Markdown
Contributor

@shreyabiradar07 Can you please review this PR

Comment thread tests/e2e/README.md
│ └── log_utils.py # Log parsing
└── tests/
├── test_01_complete_workflow.py # 10 tests
├── test_02_profiles.py # 10 tests
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md documents test_02_profiles.py (10 tests), test_03_bulk_jobs.py (11 tests) but these files don't exist under tests, please verify and update.

self.deployment_mgr.create_namespace(app_namespace)

# Deploy Prometheus first
self.deployment_mgr.deploy_prometheus()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus deployment can be skipped for OpenShift cluster

action='store_true',
help='Skip cleanup after tests'
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding namespace argument would help to override default namespace in case of OpenShift deployment, otherwise users need to edit test_config.yaml file every time

@shreyabiradar07
Copy link
Copy Markdown

@shekhar316 Please share the test results for both the cluster types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants