Skip to content

Adding Documentation For Optimizer #20

Open
shekhar316 wants to merge 9 commits into
kruize:mvp_demofrom
shekhar316:docs
Open

Adding Documentation For Optimizer #20
shekhar316 wants to merge 9 commits into
kruize:mvp_demofrom
shekhar316:docs

Conversation

@shekhar316
Copy link
Copy Markdown
Contributor

@shekhar316 shekhar316 commented May 5, 2026

Description

This PR adds a complete documentation suite for Kruize Optimizer, including installation guides, configuration references, API documentation, and contribution guidelines.

Changes

New Documentation Files

  • docs/README.md - Documentation home page with navigation to all guides
  • docs/installation.md - Installation guide for Kind, OpenShift, and local development
  • docs/build.md - Build guide for Docker images using build_and_push.sh script
  • docs/design.md - Architecture and design documentation with component diagrams
  • docs/optimizerAPI.md - API reference with OpenAPI/Swagger UI documentation
  • docs/configurables.md - Configuration guide for all environment variables and settings
  • docs/contributing.md - Contributing guide with testing guidelines and PR process

Type of Change

  • Documentation update

Testing

  • All documentation files reviewed for accuracy

Summary by Sourcery

Add a comprehensive documentation suite for Kruize Optimizer, covering installation, configuration, architecture, API usage, build process, and contribution workflow.

Documentation:

  • Introduce a documentation homepage linking all major guides for Kruize Optimizer.
  • Document installation and deployment for Kind, OpenShift, and local development setups.
  • Provide a configuration guide covering application settings, profiles, layers, deployment options, and environment variables.
  • Add architecture and design documentation describing components, data flow, and technology stack.
  • Publish an API reference for the Optimizer webhook endpoint and OpenAPI/Swagger tooling.
  • Document the build and image-publishing process using the provided build script and Maven/Jib.
  • Add a contributing guide detailing development setup, workflow, testing, code style, and PR process.

shekhar316 added 7 commits May 5, 2026 12:22
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
@shekhar316 shekhar316 self-assigned this May 5, 2026
@shekhar316 shekhar316 added documentation Improvements or additions to documentation Optimizer labels May 5, 2026
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 5, 2026

Reviewer's Guide

Adds a complete documentation suite for Kruize Optimizer, introducing a docs home, installation/build/configuration/design/API/contributing guides that describe how to install, configure, build, and integrate the optimizer with Kruize and Kubernetes/OpenShift environments.

Sequence diagram for bulk job creation and webhook processing

sequenceDiagram
    actor User
    participant Optimizer as KruizeOptimizer
    participant Scheduler as BulkSchedulerService
    participant Kruize as KruizeAutotuneService
    participant Metrics as Prometheus_Thanos
    participant DB as KruizeDB

    User->>Optimizer: configure_labels_and_profiles()
    Optimizer->>Scheduler: schedule_bulk_job(interval,measurement_duration)

    loop on_interval
        Scheduler->>Kruize: create_bulk_experiments(target_labels,profiles)
        Kruize->>Metrics: query_metrics_for_labeled_workloads()
        Metrics-->>Kruize: metrics_timeseries
        Kruize->>DB: store_experiments_and_recommendations()
        DB-->>Kruize: ack
        Kruize-->>Scheduler: bulk_job_ack(jobID)
    end

    Kruize-->>Optimizer: POST /webhook [job_summary_array]
    Optimizer->>Optimizer: process_webhook_payload(jobID,status,counters)
    Optimizer-->>User: updated_job_summary_and_status
Loading

File-Level Changes

Change Details Files
Introduce a documentation home page that links all major guides and external resources.
  • Add docs/README.md as the central docs entrypoint with navigation to installation, build, design, API, configuration, and contributing guides
  • Link back to the main project README, test documentation, Kruize design docs, and Kruize Operator repo
  • Document support and contribution channels (issues, discussions, PRs) and doc maintenance expectations
docs/README.md
Document installation flows for Kind, OpenShift, and local development, including verification and troubleshooting.
  • Describe prerequisites and required tooling for running Kruize Optimizer
  • Add Kind and OpenShift install steps using kustomize overlays with namespace/datasource differences called out
  • Document local dev setup, including application.yml Kruize URL configuration and use of quarkus:dev
  • Provide verification commands for pods/services/API plus troubleshooting for ports, pod startup, and Kruize connectivity
docs/installation.md
Provide a configuration reference covering application YAML, profiles, layers, deployment manifests, overlays, and environment variables.
  • Explain key sections of src/main/resources/application.yml and how to override via environment variables
  • Describe metadata profiles, metric profiles, and layer configs, including locations, update steps, and need to rebuild images after changes
  • Document base Kubernetes deployment and Kustomize overlays for Kind and OpenShift, highlighting default datasources and namespaces
  • List and describe all relevant Quarkus and Kruize-related environment variables and configuration best practices/validation commands
docs/configurables.md
Add an architecture and design document describing components, data flow, and planned enhancements.
  • Document high‑level architecture with a mermaid diagram showing Kruize, Optimizer, workloads, and data flow
  • Describe core layers (resource, service, model, client, configuration) and map them to existing Java classes and config files
  • Explain optimizer job/webhook/bulk flows and design principles like separation of concerns, profile-based configuration, statelessness
  • List current tech stack and outline future enhancements (logging/tracing, custom metrics, additional profiles)
docs/design.md
Publish an API reference focused on the optimizer webhook and use of OpenAPI/Swagger tooling.
  • Describe base URL, response envelope shape, standard error handling, and HTTP status codes
  • Document OpenAPI and Swagger UI endpoints and how to use them in dev mode and tooling
  • Provide a detailed spec and example for the /webhook POST endpoint, including payload schema, sample requests/responses, and configuration via KRUIZE_WEBHOOK_URL
  • Clarify that many other optimizer-exposed endpoints are slated for removal and point to Kruize service docs for their APIs
docs/optimizerAPI.md
Add a build guide for container images using the build_and_push.sh script and Quarkus Jib/Docker.
  • Describe build_and_push.sh location, options (image/tag/build/push/platforms) and typical workflows (dev vs release vs custom registry)
  • Show how to configure registry credentials and perform single/multi-platform builds
  • Document manual build paths using Maven+Jib flags and Dockerfiles (JVM/native) for advanced users
docs/build.md
Introduce a contributing guide covering dev setup, workflow, testing, style, PR process, and community links.
  • Specify prerequisites, repo fork/clone/upstream setup, and Quarkus dev flow with mvnw commands
  • Detail recommended IDE setups (IntelliJ/VS Code/Eclipse) and dependency on running Kruize/Kruize Operator or Autotune
  • Describe branching, coding, testing, commit, push, and PR steps including a PR template and checklist
  • Define Java code style, formatting, JavaDoc expectations, and documentation practices, plus community/support channels and pointers to other docs
docs/contributing.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 7 issues, and left some high level feedback:

  • There are a few minor content issues in the new docs (e.g., stray I at the end of the first paragraph in design.md, and the reference to Configurables.md in installation.md should match the actual filename configurables.md), which would be good to clean up for polish and navigation correctness.
  • In design.md there are still TODO markers for adding metadata/metric profile file links—either add the concrete links now or remove the TODOs to avoid shipping incomplete placeholders in user-facing docs.
  • Several hard-coded version statements (e.g., Java 25+, specific Kubernetes/OpenShift versions, Kruize Operator version >= 0.0.5) appear across the docs; it would be helpful to double-check that these match the actual minimum supported versions and won’t quickly become stale, or rephrase them to describe requirements more generically.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There are a few minor content issues in the new docs (e.g., stray `I` at the end of the first paragraph in `design.md`, and the reference to `Configurables.md` in `installation.md` should match the actual filename `configurables.md`), which would be good to clean up for polish and navigation correctness.
- In `design.md` there are still `TODO` markers for adding metadata/metric profile file links—either add the concrete links now or remove the TODOs to avoid shipping incomplete placeholders in user-facing docs.
- Several hard-coded version statements (e.g., Java 25+, specific Kubernetes/OpenShift versions, Kruize Operator version >= 0.0.5) appear across the docs; it would be helpful to double-check that these match the actual minimum supported versions and won’t quickly become stale, or rephrase them to describe requirements more generically.

## Individual Comments

### Comment 1
<location path="docs/installation.md" line_range="89" />
<code_context>
+
+### OpenShift Installation
+
+1. Login to your OpenShift cluster:
+```bash
+oc login <cluster-url>
</code_context>
<issue_to_address>
**suggestion (typo):** Use "Log in" (verb) instead of "Login" (noun) in this sentence

This line should be updated to: `1. Log in to your OpenShift cluster:` to use the verb form correctly.

```suggestion
1. Log in to your OpenShift cluster:
```
</issue_to_address>

### Comment 2
<location path="docs/installation.md" line_range="143-146" />
<code_context>
+
+```yaml
+# Update this section with your Kruize instance URL
+# Use different port-mapping for kruize pod, as 8080 will be used by optimizer
+kruize:
+  url: http://localhost:9090  # Change to your Kruize URL
</code_context>
<issue_to_address>
**suggestion (typo):** Capitalize "Kruize" consistently as a proper noun

Here, change "kruize pod" to "Kruize pod" to match the product name used elsewhere.

```suggestion
# Update this section with your Kruize instance URL
# Use different port-mapping for Kruize pod, as 8080 will be used by optimizer
kruize:
  url: http://localhost:9090  # Change to your Kruize URL
```
</issue_to_address>

### Comment 3
<location path="docs/design.md" line_range="17" />
<code_context>
+
+## Overview
+
+Kruize Optimizer is a cloud-native optimization service designed to provide support for Kruize (https://github.com/kruize/autotune) to automatically create experiments and manage Kruize profiles. I
+
+### Key Features
</code_context>
<issue_to_address>
**issue (typo):** Remove stray "I" at the end of the overview sentence

The line ends with an extra "I" after the period; please remove it so the sentence ends after "manage Kruize profiles."

```suggestion
Kruize Optimizer is a cloud-native optimization service designed to provide support for Kruize (https://github.com/kruize/autotune) to automatically create experiments and manage Kruize profiles.
```
</issue_to_address>

### Comment 4
<location path="docs/design.md" line_range="21" />
<code_context>
+
+### Key Features
+
+- **Automatic Experiment Creation**: Automatically Kruize create experiments based on user-defined labels
+- **Profile Management Support**: Manages and install missing profiles in Kruize
+- **Webhook Integration**: Event-driven architecture for real-time updates
</code_context>
<issue_to_address>
**suggestion (typo):** Fix grammar in "Automatically Kruize create experiments"

This wording is grammatically awkward. Please rephrase, for example:
- "Automatically create Kruize experiments based on user-defined labels", or
- "Automatically create experiments in Kruize based on user-defined labels."

```suggestion
- **Automatic Experiment Creation**: Automatically create experiments in Kruize based on user-defined labels.
```
</issue_to_address>

### Comment 5
<location path="docs/design.md" line_range="22" />
<code_context>
+### Key Features
+
+- **Automatic Experiment Creation**: Automatically Kruize create experiments based on user-defined labels
+- **Profile Management Support**: Manages and install missing profiles in Kruize
+- **Webhook Integration**: Event-driven architecture for real-time updates
+
</code_context>
<issue_to_address>
**suggestion (typo):** Use consistent verb form in "Manages and install"

Change this to: "Manages and installs missing profiles in Kruize."

Suggested implementation:

```
- **Profile Management Support**: Manages and installs missing profiles in Kruize.

```

The snippet you provided from `docs/design.md` only shows the first bullet under **Key Features**. Ensure that the second bullet (`Profile Management Support`) exists exactly as in the SEARCH text above; if the wording or spacing differs, adjust the SEARCH string accordingly so the replacement applies correctly.
</issue_to_address>

### Comment 6
<location path="docs/design.md" line_range="128-129" />
<code_context>
+**Configuration Types:**
+- **Metadata Profiles**: Define cluster and application metadata
+TODO: add metadata profiles file link
+- **Metric Profiles**: Specify metrics to collect and analyze
+TODO: add metric prifile file link
+- **Layer Configurations**: Container, JVM, and framework-specific settings
+
</code_context>
<issue_to_address>
**issue (typo):** Fix spelling of "prifile" in the TODO comment

In the TODO line, change "metric prifile" to "metric profile".

Suggested implementation:

```
- **Metric Profiles**: Specify metrics to collect and analyze
+TODO: add metric profile file link

```

None; the only required change is fixing the typo in the TODO comment.
</issue_to_address>

### Comment 7
<location path="docs/design.md" line_range="240" />
<code_context>
+
+## Future Enhancements Planned 
+
+- Improve Logging and Tracing
+- Add support for custom prometheus metrics
+- Add support for multiple custom pre-configured profiles for dev/prod clusters
+
</code_context>
<issue_to_address>
**suggestion (typo):** Capitalize "Prometheus" as a proper noun

This keeps it consistent with how we reference the monitoring system elsewhere in the docs.

```suggestion
- Add support for custom Prometheus metrics
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread docs/installation.md Outdated
Comment thread docs/installation.md
Comment thread docs/design.md Outdated
Comment thread docs/design.md Outdated
Comment thread docs/design.md Outdated
Comment thread docs/design.md Outdated
Comment thread docs/design.md Outdated
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
@shekhar316
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In optimizerAPI.md, the table of contents and narrative still reference multiple APIs (Status, Datasource, Jobs, Profiles, Layers) that are marked as being removed in the next release but are not actually documented; consider either documenting the current endpoints or trimming the TOC/content to focus solely on the webhook to avoid confusing users about what is really supported.
  • Both installation.md and contributing.md repeat a substantial "Kruize Dependency" section almost verbatim; you might want to centralize that explanation in one doc and link to it from the other to reduce duplication and make future updates less error-prone.
  • Several docs (e.g., design.md, configurables.md, installation.md) describe components or configuration that are expected to change in the “next release” or are tightly coupled to specific versions; consider clearly scoping those statements to a specific version of the optimizer or adding a short “Compatibility/Version” note so users know when the docs might be outdated.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `optimizerAPI.md`, the table of contents and narrative still reference multiple APIs (Status, Datasource, Jobs, Profiles, Layers) that are marked as being removed in the next release but are not actually documented; consider either documenting the current endpoints or trimming the TOC/content to focus solely on the webhook to avoid confusing users about what is really supported.
- Both `installation.md` and `contributing.md` repeat a substantial "Kruize Dependency" section almost verbatim; you might want to centralize that explanation in one doc and link to it from the other to reduce duplication and make future updates less error-prone.
- Several docs (e.g., `design.md`, `configurables.md`, `installation.md`) describe components or configuration that are expected to change in the “next release” or are tightly coupled to specific versions; consider clearly scoping those statements to a specific version of the optimizer or adding a short “Compatibility/Version” note so users know when the docs might be outdated.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
@shekhar316
Copy link
Copy Markdown
Contributor Author

shekhar316 commented May 5, 2026

As suggested by sourcery -

  1. Updated Table of Contents in API docs.
  2. Removed duplicate parts from contributing and installation md files.
  3. Added a version number 0.0.1 for incompatible API.

Comment thread docs/installation.md

Once the service is running, verify API availability:

> NOTE: Support for the /health endpoint is pending and will be implemented in the next version.
Copy link
Copy Markdown

@afra-azf afra-azf May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note says support for the /health endpoint is still pending, but the very next commands instruct users to verify the service with curl http://localhost:8080/health and curl http:///health. Since that endpoint is not currently available, these validation steps will fail for anyone following the installation guide as written.

Can we either remove these /health checks for now or replace them with a verification step against a supported endpoint? That would keep the documentation aligned with the current implementation and avoid confusing users during setup.

Comment thread docs/installation.md
# Set environment variable before running
export QUARKUS_HTTP_PORT=9090
./mvnw quarkus:dev
```
Copy link
Copy Markdown

@afra-azf afra-azf May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scenario where Autotune was already running on port 8080 and I wanted to run Optimizer on a different port (for example, 8081), simply modifying kruize.url in application.yml did not work for me.

Analysis

This might be happening because:

  1. It does not address the missing KRUIZE_TARGET_LABELS environment variable.

  2. It does not resolve the port conflict when Autotune is already running on 8080.

  3. The kruize.url configuration only controls where the Optimizer connects to (i.e., Autotune’s endpoint), not where the Optimizer service itself runs.

Port Configuration Clarification

There are three separate configurations involved:
Autotune Port
Defines the port on which the Autotune backend runs.
Example: 8080
Optimizer Port
Defines the port on which the Optimizer service itself runs.
Configure using: -Dquarkus.http.port=XXXX
Example: 8081 or 9090
Connection URL (KRUIZE_URL)
Defines where the Optimizer connects to reach Autotune.
Example: http://localhost:8080

For example, changing kruize.url in application.yml to http://localhost:9090 does not make the Optimizer run on port 9090. It only instructs the Optimizer to look for the Autotune service at port 9090.

It may be helpful to add this clarification to the documentation to avoid confusion for users running Autotune and Optimizer on separate ports.

Comment thread docs/installation.md

- **Java**: Java 25 or higher (for building Kruize Optimizer)
- **Maven**: 3.6.0 or higher (for building from source)
- **Docker**: 20.10 or higher (for containerized deployment)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Docker-compatible runtimes such as Podman are supported, the document should say so explicitly. Otherwise, users may incorrectly conclude that a working environment is unsupported.
Suggestion :
Clarify whether the prerequisite is strictly Docker, or whether supported Docker-compatible runtimes such as Podman are acceptable.

Comment thread docs/installation.md
```bash
# Port forward to access locally
kubectl port-forward -n monitoring svc/kruize-optimizer 8080:8080

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the Kind installation steps worked up to deployment and service creation, and the pod eventually reached Running state. However, after the documented port-forward step at ['kubectl port-forward -n monitoring svc/kruize-optimizer 8080:8080'], the document only says to access ['http://localhost:8080']. In practice, hitting the root path returns a 404 error payload rather than a successful verification response.
A user can complete the documented steps and still be left thinking the deployment failed when the issue is actually the documented verification target.
Replace the bare root URL with an actual supported verification endpoint, such as ['/openapi'], or document the intended path explicitly.
Also , please mention that some browsers may download OpenAPI spec as a file instead of rendering it inline, which is still a valid sign that the endpoint is working.

Comment thread docs/contributing.md
./mvnw clean verify
```

4. **Update documentation** if needed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbering is going from 1,2 to 4 , please correct this.

Comment thread docs/design.md

- Users will directly interact with Kruize APIs or UI.
- Optimizer will interact with Kruize to create experiments and manage profiles.
- The UI will interact with Kruize to display and manage experiments/ recommendations.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

experiments/recommendations -- you can remove space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Optimizer

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants