Skip to content

fix(onramp): fail-closed when client IP cannot be determined#311

Merged
piyalbasu merged 4 commits into
mainfrom
security/onramp-fail-closed
May 6, 2026
Merged

fix(onramp): fail-closed when client IP cannot be determined#311
piyalbasu merged 4 commits into
mainfrom
security/onramp-fail-closed

Conversation

@piyalbasu
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #310. The IP-binding stopgap silently degraded to "no binding" on default deployments where FREIGHTER_TRUST_PROXY_RANGE was unset — the handler dropped clientIp and proceeded, minting an unbound Coinbase session against SDF's API credentials. SDF prod was not exposed (the env var is set explicitly to 172.16.0.0/12 in all three envs), but third-party deployers and any future SDF deployment with a topology change were vulnerable.

This PR closes the bug class and tightens the surrounding surface:

  • Fail-closed in the handler. /onramp/token now returns 400 Could not determine client IP for Coinbase session binding when request.ip resolves to an internal address (loopback, RFC1918, link-local, IPv6 ULA), instead of dropping clientIp and proceeding (src/route/index.ts:1518-1532).
  • Default for FREIGHTER_TRUST_PROXY_RANGE. Falls through to "loopback,linklocal,uniquelocal" when the value is empty. Covers typical k8s / load-balancer topologies without per-deployment configuration. The env var must still be declared in .env (it's in ENV_KEYS) — .env-EXAMPLE already declares it empty, which is the path of least resistance.
  • Boot-time validation. initApiServer throws if FREIGHTER_TRUST_PROXY_RANGE is non-empty but parses to no subnets (whitespace, ", , ,", etc.). Catches misconfigurations before they go live.
  • clientIp is now required. Was clientIp?: string in fetchOnrampSessionToken with a conditional spread that omitted the field when undefined. Locks the IP-binding invariant at the type level — future callers can't re-introduce the unbound-token bug.
  • Outbound Coinbase fetch timeout. AbortSignal.timeout(10_000) prevents slow-Coinbase amplification of resource consumption.
  • Per-route rate limit on /onramp/token. Tightened to 20 req/min/IP (vs the global 100 req/min/IP). Inherits the global Redis store via @fastify/rate-limit's parent-config merge.

This PR explicitly reverses the fail-open design from #310. PR #310's commit message described "dropping clientIp keeps the endpoint functional while we surface the misconfiguration via a warning log" — that trade was correct given the hotfix scope at the time, but a security report flagged that an unbound session is worse than no session (it converts the endpoint into an unauthenticated phishing primitive against SDF's Coinbase merchant credentials). The full ownership-proof mitigation (wallet-signature challenge / Origin + per-install HMAC) is still tracked separately and will close the remaining browser-based phishing vector.

The wallet-eng-runbooks PR adds a Cause D to the existing onramp incident runbook covering the new 400 — diagnosis (grep the warning log, inspect rawIp / socketRemote), and resolution (which trust ranges to use for which topology changes — cluster recreation, service mesh, CNI swap, new node groups).

Test plan

  • npx jest — 140 passed, 3 skipped, 0 failures
  • npx tsc --noEmit — clean
  • Confirm /onramp/token still works in wallet-eng-stg after deploy (XFF header arrives, clientIp forwarded to Coinbase, session minted)
  • Confirm the warning-log rate stays at 0 in observability (no real users hitting the fail-closed path)
  • Manually verify boot-time validation by setting FREIGHTER_TRUST_PROXY_RANGE=" " in a dev container and observing the startup error

🤖 Generated with Claude Code

Follow-up to #310. The IP-binding stopgap silently degraded to "no
binding" on default deployments because FREIGHTER_TRUST_PROXY_RANGE
had no default and the handler dropped clientIp + proceeded when
request.ip resolved to an internal address. This change makes the
endpoint fail-closed and tightens the surrounding surface:

- Handler returns 400 "Could not determine client IP for Coinbase
  session binding" when request.ip is internal, instead of dropping
  clientIp and minting an unbound session.
- FREIGHTER_TRUST_PROXY_RANGE defaults to "loopback,linklocal,
  uniquelocal" when the value is empty (the env var must still be
  declared per ENV_KEYS; .env-EXAMPLE already declares it empty).
- initApiServer throws at boot if the var is non-empty but parses
  to no subnets (whitespace-only, ", , ,", etc.).
- clientIp is now required on fetchOnrampSessionToken; the
  conditional spread that omitted it when undefined is removed.
  Locks the IP-binding invariant at the type level so future
  callers can't re-introduce the unbound-token bug.
- Outbound Coinbase fetch gets a 10s AbortSignal.timeout to prevent
  slow-Coinbase amplification of resource consumption.
- Per-route rate limit on /onramp/token tightened to 20/min/IP
  (vs the global 100/min/IP); inherits the global Redis store.

Tests cover the fail-closed 400, the AbortSignal attachment, and
the boot-time validation throw. README documents the env-var
declaration requirement and the local-dev workflow (curl + XFF
header, or a Caddy reverse proxy for the Freighter extension).
docs/runbook.md updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 5, 2026 20:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Tightens Coinbase onramp session IP-binding by making /onramp/token fail closed when a public client IP can’t be reliably derived, and hardens the surrounding configuration and operational controls (defaults, boot-time validation, timeouts, and rate limiting).

Changes:

  • Added boot-time parsing/validation for FREIGHTER_TRUST_PROXY_RANGE and updated Fastify trustProxy wiring.
  • Updated /onramp/token to refuse issuing a Coinbase session when request.ip is internal/private, and added a stricter per-route rate limit.
  • Made clientIp required for Coinbase token minting, added an outbound fetch timeout, and updated tests/docs accordingly.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/route/index.ts Validates/compiles trust proxy configuration; tightens /onramp/token behavior and rate limiting.
src/route/index.test.ts Adds coverage for trust-proxy boot validation and fail-closed behavior; updates onramp route expectations.
src/helper/test-helper.ts Changes dev server default trustProxyRange to match new default trust behavior for tests.
src/helper/onramp.ts Requires clientIp and adds an outbound timeout for Coinbase token requests.
src/helper/onramp.test.ts Updates tests for required clientIp and timeout signal attachment.
src/config.ts Adds a default value for trustProxyRange when unset/empty.
README.md Documents required env key behavior and how to test /onramp/token locally under fail-closed semantics.
docs/runbook.md Updates runbook configuration guidance for FREIGHTER_TRUST_PROXY_RANGE and the new fail-closed behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/config.ts
Comment thread src/route/index.ts
Comment thread README.md Outdated
piyalbasu and others added 2 commits May 5, 2026 16:57
Address Copilot review feedback on PR #311. The ENV_KEYS presence
check uses `process.env[key]` (truthy), so an empty-string value in
a k8s ConfigMap or shell env counts as "missing" and throws
INVALID_ENV — the new default in buildConfig was unreachable for any
deployment that didn't put the key in a parsed .env file.

Removing the key from ENV_KEYS lets the buildConfig default fall
through naturally:

- Unset entirely (no .env, no process.env) → default applies.
- Declared empty in any source → default applies.
- Declared with a value → that value used.

Updates README.md and docs/runbook.md to drop the now-incorrect
"must be declared" framing and describe the var as optional with a
documented default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address belt-and-suspenders concern around suggestion #3 from the
security report (boot fail-fast for unset trust range). Boot
fail-fast is impractical without operational fragility, so document
the topology decisions and required observability instead:

- Decision table mapping common deployment topologies (k8s+LB, CDN,
  service mesh, bare metal) to recommended FREIGHTER_TRUST_PROXY_RANGE
  values, with rationale for each.
- Specifically calls out that the default does NOT trust CDN IPs,
  so CDN-fronted deployers must configure explicit CIDRs to avoid
  binding sessions to shared CDN egress IPs.
- Required observability checklist: alert on the
  "Refusing to issue an unbound Coinbase session" warning log and on
  /onramp/token 4xx rate, plus Coinbase session-redemption-failure
  rate for CDN-fronted setups.
- Links to wallet-eng-runbooks Cause D runbook for diagnosis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread README.md
- Alert on `/api/v1/onramp/token` 4xx rate. SDF's alert for this is [`freighter-backend-onramp-token-error-rate-high`](https://github.com/stellar/wallet-eng-runbooks/blob/main/freighter-backend/runbooks/application/freighter-backend-onramp-token-error-rate-high.md) (warning severity, 30-min escalation). The runbook's `Cause D` walks oncall through diagnosis.
- For CDN-fronted deployments specifically, also alert on Coinbase session-redemption-failure rate if Coinbase exposes one to the merchant — that's the signal for _wrong-but-public_ IP bindings (the failure mode the application can't detect itself).

### Testing `/onramp/token` locally
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I acknowledge that this causes some annoying overhead with running Freighter BE and using the onramp locally, but ultimately, we will be moving to authenticated requests to Freighter's backend, so this is some short-lived poor devx

@aristidesstaffieri
Copy link
Copy Markdown
Contributor

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@aristidesstaffieri
Copy link
Copy Markdown
Contributor

The docs include several references to SDF specific infra and internal processes. Should the docs be more generic for general operators outside of the SDF?

@piyalbasu
Copy link
Copy Markdown
Contributor Author

The docs include several references to SDF specific infra and internal processes. Should the docs be more generic for general operators outside of the SDF?

Good callout - I'll do that

Address review feedback on PR #311 from @aristidesstaffieri — the
docs included references to SDF-specific infrastructure (the
wallet-eng-runbooks alert URL, "SDF prod uses this shape") that
aren't useful to general operators running their own
freighter-backend deployment.

- Decision table row about explicit-CIDR config now describes the
  use case ("you know your VPC CIDR exactly and want to reject other
  private ranges") instead of pointing to SDF prod as an example.
- Observability section dropped the SDF-specific Prometheus alert
  link and rewrote each bullet to describe what to alert on, what
  signal to inspect, and how to interpret it — generic enough for
  any deployer regardless of their monitoring stack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@piyalbasu piyalbasu merged commit 1541b62 into main May 6, 2026
6 checks passed
@piyalbasu piyalbasu deleted the security/onramp-fail-closed branch May 6, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants