Skip to content

Fix: resolve webhook retry processor conflict and share retry config (#8)#28

Open
Sam-Rytech wants to merge 24 commits into
LabsCrypt:mainfrom
Sam-Rytech:fix-webhook-retry
Open

Fix: resolve webhook retry processor conflict and share retry config (#8)#28
Sam-Rytech wants to merge 24 commits into
LabsCrypt:mainfrom
Sam-Rytech:fix-webhook-retry

Conversation

@Sam-Rytech

Copy link
Copy Markdown
Contributor

Description

This PR resolves the conflicting retry implementations between webhookRetryScheduler and webhookService.

Changes Made

  • Picked webhookRetryProcessor.ts as the single surviving implementation and wired it into index.ts.
  • Deleted the conflicting webhookRetryScheduler.ts to remove duplicate sources of truth.
  • Extracted backoff configurations and max attempts into a single shared WEBHOOK_RETRY_CONFIG used by webhookService.ts.
  • Updated webhook retry SQL queries to properly alias columns (wd.id, ws.callback_url, etc.), avoiding ambiguous references and mapping correctly to the current schema.
  • Added a test in webhookRetryProcessor.test.ts asserting that the processor respects the configured backoff delays and max attempts.

Closes #8

devfoma and others added 10 commits June 18, 2026 13:42
- Removed webhookRetryScheduler.ts as its logic conflicts with webhookService.ts
- Extracted backoff config and max attempts into a shared config WEBHOOK_RETRY_CONFIG
- Updated WebhookService.processRetries query to properly alias columns to avoid ambiguous references and to utilize the new config
- Updated index.ts to wire up the surviving webhookRetryProcessor
- Added test verifying retry processor respects backoff delays and max attempts

Closes #123
- Add POST /admin/approve-loan endpoint with JWT auth and admin RBAC
- Add approveLoan controller function in indexerController.ts
- Add approveLoanBodySchema with zod validation for loanId
- Add Swagger documentation for the new endpoint
- Add 5 tests covering: successful build, non-admin rejection, missing loanId, invalid loanId, and missing auth
- All 220 tests pass
Closes LabsCrypt#5

Two concurrent indexer instances could both read the same cursor from
indexer_state, fetch the same ledger range, and fire webhooks/notifications
before the ON CONFLICT DO NOTHING result was known — wasting RPC calls
and risking duplicate side-effect dispatch.

Changes:
- pollOnce() now acquires pg_try_advisory_lock(738154291) on a dedicated
  connection at the start of each cycle. A second instance that calls
  pg_try_advisory_lock while the first holds it gets false immediately and
  skips the cycle rather than blocking.
- The lock is released in a finally block on the same connection that
  acquired it; the connection is always returned to the pool regardless of
  whether the cycle succeeds or throws.
- Side effects (webhook dispatch, notifications) were already gated on
  INSERT ... RETURNING rowCount > 0, so only the instance that wins the
  insert fires them. Added a test to document and protect this guarantee.
- Added mockLockClient default in src/__tests__/eventIndexer.test.ts so
  existing integration-style tests that call pollOnce() directly continue
  to work without changes.
- Three new tests in services/__tests__/eventIndexer.test.ts cover:
  1. Lock-not-acquired path: no DB writes, no dispatches, client released
  2. Lock acquired but processing throws: unlock + client release still run
  3. Two instances same event: only the inserting instance dispatches
- Add debug log line when advisory lock is successfully acquired so
  operators can confirm which instance is active in multi-pod deployments
- Add mockLogger.debug to src/__tests__/eventIndexer.test.ts to match
- Add test: restarted instance reads persisted cursor (last_indexed_ledger=500)
  and resumes from ledger 501, not from 0 — covers acceptance criterion 3
test: add unit tests for SorobanService and mock Stellar RPC
…stamp-collisions

fix: resolve migration timestamp collisions
…-loan-endpoint

feat(admin): add approve_loan endpoint for admin loan approval

@ogazboiz ogazboiz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the shared retry config part is done right, WEBHOOK_RETRY_CONFIG is exported once from webhookService.ts and consumed there, the duplicate processor file is gone and index.ts is rewired consistently. but the branch does not build or test yet, and there's a design regression to sort:

  1. build and test break on a dangling import of a deleted module. src/tests/distributedLock.test.ts:38-39 still does await import("../services/webhookRetryScheduler.js") and tests retryFailedWebhooks at line 53, but this pr deleted that file. npm run build fails with TS2307 (cannot find module ../services/webhookRetryScheduler.js) and the test errors at runtime. remove or rewrite that describe block (the import at ~38-39 and the matching describe("webhookRetryScheduler", ...)), or point it at the new processor.

  2. this drops the multi-instance safety that #23 just added. you deleted the lock-protected webhookRetryScheduler.ts (it took a redis lock via cacheService.setNotExists(...:running)) and kept the unprotected webhookRetryProcessor.ts, which runs processRetries() every 10s with no lock. processRetries() in webhookService.ts:290 does a plain SELECT ... LIMIT 100 with no FOR UPDATE SKIP LOCKED, so two instances will grab the same rows and double-send the webhooks. either bring the redis lock back around the processor interval, or add FOR UPDATE SKIP LOCKED and mark rows in-flight.

  3. prettier fails on src/services/__tests__/webhookRetryProcessor.test.ts and src/services/webhookService.ts. run npx prettier --write on both.

  4. minor: MAX_RETRY_ATTEMPTS is hardcoded to 4 in webhookService.ts, replacing the old derived RETRY_DELAYS_MS.length + 1. correct today (3+1) but it will silently drift if the delays array changes. consider RETRY_DELAYS_MS.length + 1.

get build + test + prettier green and restore the lock semantics and this is close.

if you want to keep contributing, join us on Telegram: https://t.me/+DOylgFv1jyJlNzM0

nonsobethel0-dev and others added 9 commits June 20, 2026 10:18
…, format

- Add getClient to connection.js mocks in loanDispute, remittanceService,
  notificationCleanup, and auditLog tests (prevented 5 test failures due
  to missing export from ESM mock)
- Fix TypeScript errors in src/__tests__/eventIndexer.test.ts: type
  getClient mock with explicit generic, fix mock.calls cast for filter,
  fix mockRejectedValueOnce typing, cast mockImplementation, add optional
  chain on queriedRanges[0]
- Run prettier on src/services/__tests__/eventIndexer.test.ts
…exer-advisory-lock

fix: guard event indexer poll cycle with Postgres advisory lock
- Add missing environment variables to CI workflow

- Ignore demo and mock files in ESLint config to prevent linting failures

Note: As part of adding the CI workflow, the build/typecheck scripts were rewritten, redis mock was dropped from jest config, and the Dockerfile was edited to better support CI.
… timeouts

Fixes LabsCrypt/remitlend-backend issue LabsCrypt#16.

- Dockerfile: add HEALTHCHECK that probes /health on the exposed port using
  busybox wget (no extra packages required), with sensible
  interval/timeout/start-period/retries values (30s/10s/40s/3) and a defense-
  in-depth wget --timeout=8.
- src/app.ts: wrap each /health dependency probe (DB, Redis, Soroban RPC)
  in a Promise.race with a per-check timeout (default 2000ms, configurable
  via HEALTH_CHECK_TIMEOUT_MS) so a hung dependency degrades to "error"
  rather than blocking the whole endpoint and starving the Docker probe.
- src/__tests__/health.test.ts: add resilience tests verifying the endpoint
  fails fast with the right error status when a dependency hangs and that a
  custom HEALTH_CHECK_TIMEOUT_MS override is honored; clean up env vars
  via afterEach to prevent leaks across failures.
…16-dockerfile-healthcheck

chore(infra): add Docker HEALTHCHECK for /health endpoint (fixes LabsCrypt#16)

@ogazboiz ogazboiz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is basically there, all four items from my last review are fixed: the dangling import of the deleted scheduler is gone (now uses WebhookService.processRetries()), the multi-instance lock is back (webhook_retry_scheduler:running, NX, 120s TTL, released in finally), the shared WEBHOOK_RETRY_CONFIG/RETRY_DELAYS_MS is defined once with no divergent duplicate, and MAX_RETRY_ATTEMPTS is derived (RETRY_DELAYS_MS.length + 1). just two mechanical things:

  1. CI build-and-test is failing on the lint step: src/tests/distributedLock.test.ts:37:27 prettier/prettier "Delete CRLF" (a new prettier slip in the file you edited). run: npx prettier --write src/tests/distributedLock.test.ts

  2. rebase onto main, you're 17 behind (predates the CI + migration merges): git fetch origin && git rebase origin/main

fix that one prettier line, rebase, and i'll merge.

if you want to keep contributing, join us on Telegram: https://t.me/+DOylgFv1jyJlNzM0

- Removed webhookRetryScheduler.ts as its logic conflicts with webhookService.ts
- Extracted backoff config and max attempts into a shared config WEBHOOK_RETRY_CONFIG
- Updated WebhookService.processRetries query to properly alias columns to avoid ambiguous references and to utilize the new config
- Updated index.ts to wire up the surviving webhookRetryProcessor
- Added test verifying retry processor respects backoff delays and max attempts

Closes #123

@ogazboiz ogazboiz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're rebased now (merge-base is current main, the prior 17-behind is resolved) and all four logic items are still fixed, the lock, the derived MAX_RETRY_ATTEMPTS, the single shared config, no dangling import. the red CI check showing right now is stale (pre-rebase). just one new prettier slip:

  1. src/tests/distributedLock.test.ts:37 fails prettier (the await import is split across 37-38, prettier wants it on one line). run: npx prettier --write src/tests/distributedLock.test.ts then push so CI re-runs fresh.

once that pushes green i'll merge.

if you want to keep contributing, join us on Telegram: https://t.me/+DOylgFv1jyJlNzM0

@Sam-Rytech

Copy link
Copy Markdown
Contributor Author

@ogazboiz made the requested adjustment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Backend] Two webhook retry implementations diverge and one is wired up, the other is dead

8 participants