Skip to content

[GOBBLIN-2247] Increase RPC retry budget to 10 min for Temporal gRPC throttling#4180

Merged
Blazer-007 merged 1 commit intoapache:masterfrom
DaisyModi:dmodi/tune-rpc-retry-policy-for-throttling-resilience
Mar 31, 2026
Merged

[GOBBLIN-2247] Increase RPC retry budget to 10 min for Temporal gRPC throttling#4180
Blazer-007 merged 1 commit intoapache:masterfrom
DaisyModi:dmodi/tune-rpc-retry-policy-for-throttling-resilience

Conversation

@DaisyModi
Copy link
Copy Markdown
Contributor

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Follow-up to #4176. The previous RPC retry defaults (initialInterval=500ms, maximumInterval=30s, maximumAttempts=10) provided ~2.5 minutes of cumulative retry budget — only marginally better than the Temporal SDK's built-in 1-minute expiration default (DefaultStubServiceOperationRpcRetryOptions). This was insufficient to survive sustained throttle bursts of up to 10 minutes observed in production.

Changes to default values in GobblinTemporalConfigurationKeys:

Parameter Old New Rationale
initialInterval 500ms 1000ms Less aggressive on a throttled server
maximumInterval 30s 60s Sufficient spacing during sustained throttling
backoffCoefficient 2.0 2.0 Unchanged
maximumAttempts 10 15 Fewer attempts needed thanks to larger max interval

With initialInterval=1s, coefficient=2.0, maximumInterval=60s, maximumAttempts=15:

  • Attempts 1–6: 1 + 2 + 4 + 8 + 16 + 32 = 63s (hits 60s cap at attempt 7)
  • Attempts 7–15: 9 × 60s = 540s
  • Total: ~603s (~10 min)

All values remain configurable via Typesafe Config.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Config-only change — no logic changes. buildRpcRetryOptions is unchanged; only the default constants are tuned.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message".

…RPC throttling

The previous defaults (~2.5 min) were insufficient to survive sustained throttle
bursts. Tuned initialInterval to 1s, maximumInterval to 60s, and maximumAttempts
to 15, yielding ~603s (~10 min) of cumulative retry coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DaisyModi DaisyModi force-pushed the dmodi/tune-rpc-retry-policy-for-throttling-resilience branch from 7eebddc to 911a571 Compare March 27, 2026 10:15
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 38.75%. Comparing base (26406a8) to head (911a571).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4180      +/-   ##
============================================
- Coverage     43.26%   38.75%   -4.52%     
+ Complexity     2583     1600     -983     
============================================
  Files           516      389     -127     
  Lines         22087    16030    -6057     
  Branches       2505     1590     -915     
============================================
- Hits           9557     6212    -3345     
+ Misses        11566     9319    -2247     
+ Partials        964      499     -465     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Blazer-007 Blazer-007 merged commit c25746b into apache:master Mar 31, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants