Skip to content

Add performance tuning section for high-throughput Runners#1797

Merged
fdevans merged 2 commits into
4.0.xfrom
docs/runner-performance-tuning
May 1, 2026
Merged

Add performance tuning section for high-throughput Runners#1797
fdevans merged 2 commits into
4.0.xfrom
docs/runner-performance-tuning

Conversation

@ltamaster
Copy link
Copy Markdown
Contributor

Summary

Adds a new Performance tuning for high-throughput Runners section to runner-config.md that documents the main JVM system properties available when users hit the server-side error:

Failed: IOFailure: Runner did not deliver reports in the configured timeout period

This error indicates the Runner is saturated (not that the operation itself hung), and until now there was no customer-facing documentation describing how to scale the Runner to handle higher concurrency or log throughput.

Properties documented

  • Operation concurrency: runner.operations.maxRunning (default 50)
  • Report delivery (with warning):
    • runner.reporter.sendRate (default 2s)
    • runner.reporter.sendBatchSize (default 1000)
  • Micronaut HTTP client:
    • micronaut.http.client.pool.max-connections (default 50)
    • micronaut.http.client.pool.acquire-timeout (default 10s)
    • micronaut.http.client.read-timeout (default 60s)
    • micronaut.http.client.connect-timeout (default 10s)

Each property describes its purpose and when to increase/decrease it. Each subsection includes a runnable java -D... -jar pd-runner.jar example.

Design notes

  • The Report delivery section opens with an admonition warning that changing these values increases the HTTP request rate/size reaching the server and can raise CPU/memory/database load on busy deployments. The defaults are explicitly recommended.
  • Internal-only levers (micronaut.executors.*, micronaut.netty.*, polling rates, kill switches) are deliberately omitted — they are either risky to expose or not related to the timeout class this section targets.
  • No dashboard references are included since the relevant Grafana dashboards are internal for now. Once metrics documentation is written, a follow-up can link from this section.

Test plan

  • Review the rendered markdown for proper formatting (headings, admonition, code blocks)
  • Verify the internal anchor #configure-java-heap-size still resolves (used in the maxRunning caveat)
  • Confirm the admonition renders correctly with the docs site theme

🤖 Generated with Claude Code

Document the JVM system properties available to tune Runner capacity
when users hit "Runner did not deliver reports in the configured
timeout period" errors under heavy load.

Covers:
- Operation concurrency (runner.operations.maxRunning)
- Report delivery batching (sendRate, sendBatchSize) with a warning
  that changing these puts additional load on the server and the
  defaults are recommended
- Micronaut HTTP client pool and timeouts

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@ltamaster ltamaster requested review from a team and Copilot April 27, 2026 18:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds customer-facing documentation for tuning Runner performance when high concurrency/log throughput leads to report-delivery timeouts.

Changes:

  • Introduces a new “Performance tuning for high-throughput Runners” section explaining the timeout symptom and what it indicates.
  • Documents key JVM -D properties for operation concurrency, report batching/flush behavior (with warning), and Micronaut HTTP client pool/timeouts.
  • Provides runnable java -D... -jar ... examples for each tuning area.

Comment on lines +83 to +87
Example — double the default concurrency:

```
java -Drunner.operations.maxRunning=100 -Xmx8g -jar pd-runner.jar
```
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples in this new section use pd-runner.jar, but earlier on the same page other examples use runner.jar / pdrunner.jar. This inconsistency can confuse readers who copy/paste commands. Consider standardizing the jar filename across this page, or add a short note in this new section clarifying that pd-runner.jar is the expected Runner jar name (or that it should be replaced with the actual jar filename the user downloaded).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standardized all jar filename references across the page to pd-runner.jar in commit 2408478. This covers the proxy section (pdrunner.jarpd-runner.jar) and the heap size/temp dir sections (runner.jarpd-runner.jar), so all examples on the page now match.

@ltamaster ltamaster added this to the 6.0.0 milestone Apr 28, 2026
@fdevans fdevans merged commit d1ad616 into 4.0.x May 1, 2026
3 checks passed
@fdevans fdevans deleted the docs/runner-performance-tuning branch May 1, 2026 18:25
@ronaveva ronaveva modified the milestones: 6.0.0, 5.20.1 May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants