Fix: resolve webhook retry processor conflict and share retry config (#8)#28
Fix: resolve webhook retry processor conflict and share retry config (#8)#28Sam-Rytech wants to merge 24 commits into
Conversation
- Removed webhookRetryScheduler.ts as its logic conflicts with webhookService.ts - Extracted backoff config and max attempts into a shared config WEBHOOK_RETRY_CONFIG - Updated WebhookService.processRetries query to properly alias columns to avoid ambiguous references and to utilize the new config - Updated index.ts to wire up the surviving webhookRetryProcessor - Added test verifying retry processor respects backoff delays and max attempts Closes #123
- Add POST /admin/approve-loan endpoint with JWT auth and admin RBAC - Add approveLoan controller function in indexerController.ts - Add approveLoanBodySchema with zod validation for loanId - Add Swagger documentation for the new endpoint - Add 5 tests covering: successful build, non-admin rejection, missing loanId, invalid loanId, and missing auth - All 220 tests pass
Closes LabsCrypt#5 Two concurrent indexer instances could both read the same cursor from indexer_state, fetch the same ledger range, and fire webhooks/notifications before the ON CONFLICT DO NOTHING result was known — wasting RPC calls and risking duplicate side-effect dispatch. Changes: - pollOnce() now acquires pg_try_advisory_lock(738154291) on a dedicated connection at the start of each cycle. A second instance that calls pg_try_advisory_lock while the first holds it gets false immediately and skips the cycle rather than blocking. - The lock is released in a finally block on the same connection that acquired it; the connection is always returned to the pool regardless of whether the cycle succeeds or throws. - Side effects (webhook dispatch, notifications) were already gated on INSERT ... RETURNING rowCount > 0, so only the instance that wins the insert fires them. Added a test to document and protect this guarantee. - Added mockLockClient default in src/__tests__/eventIndexer.test.ts so existing integration-style tests that call pollOnce() directly continue to work without changes. - Three new tests in services/__tests__/eventIndexer.test.ts cover: 1. Lock-not-acquired path: no DB writes, no dispatches, client released 2. Lock acquired but processing throws: unlock + client release still run 3. Two instances same event: only the inserting instance dispatches
- Add debug log line when advisory lock is successfully acquired so operators can confirm which instance is active in multi-pod deployments - Add mockLogger.debug to src/__tests__/eventIndexer.test.ts to match - Add test: restarted instance reads persisted cursor (last_indexed_ledger=500) and resumes from ledger 501, not from 0 — covers acceptance criterion 3
test: add unit tests for SorobanService and mock Stellar RPC
…stamp-collisions fix: resolve migration timestamp collisions
…-loan-endpoint feat(admin): add approve_loan endpoint for admin loan approval
ogazboiz
left a comment
There was a problem hiding this comment.
the shared retry config part is done right, WEBHOOK_RETRY_CONFIG is exported once from webhookService.ts and consumed there, the duplicate processor file is gone and index.ts is rewired consistently. but the branch does not build or test yet, and there's a design regression to sort:
-
build and test break on a dangling import of a deleted module.
src/tests/distributedLock.test.ts:38-39still doesawait import("../services/webhookRetryScheduler.js")and testsretryFailedWebhooksat line 53, but this pr deleted that file.npm run buildfails with TS2307 (cannot find module ../services/webhookRetryScheduler.js) and the test errors at runtime. remove or rewrite that describe block (the import at ~38-39 and the matchingdescribe("webhookRetryScheduler", ...)), or point it at the new processor. -
this drops the multi-instance safety that #23 just added. you deleted the lock-protected
webhookRetryScheduler.ts(it took a redis lock viacacheService.setNotExists(...:running)) and kept the unprotectedwebhookRetryProcessor.ts, which runsprocessRetries()every 10s with no lock.processRetries()in webhookService.ts:290 does a plainSELECT ... LIMIT 100with noFOR UPDATE SKIP LOCKED, so two instances will grab the same rows and double-send the webhooks. either bring the redis lock back around the processor interval, or addFOR UPDATE SKIP LOCKEDand mark rows in-flight. -
prettier fails on
src/services/__tests__/webhookRetryProcessor.test.tsandsrc/services/webhookService.ts. runnpx prettier --writeon both. -
minor:
MAX_RETRY_ATTEMPTSis hardcoded to 4 in webhookService.ts, replacing the old derivedRETRY_DELAYS_MS.length + 1. correct today (3+1) but it will silently drift if the delays array changes. considerRETRY_DELAYS_MS.length + 1.
get build + test + prettier green and restore the lock semantics and this is close.
if you want to keep contributing, join us on Telegram: https://t.me/+DOylgFv1jyJlNzM0
…, format - Add getClient to connection.js mocks in loanDispute, remittanceService, notificationCleanup, and auditLog tests (prevented 5 test failures due to missing export from ESM mock) - Fix TypeScript errors in src/__tests__/eventIndexer.test.ts: type getClient mock with explicit generic, fix mock.calls cast for filter, fix mockRejectedValueOnce typing, cast mockImplementation, add optional chain on queriedRanges[0] - Run prettier on src/services/__tests__/eventIndexer.test.ts
…exer-advisory-lock fix: guard event indexer poll cycle with Postgres advisory lock
- Add missing environment variables to CI workflow - Ignore demo and mock files in ESLint config to prevent linting failures Note: As part of adding the CI workflow, the build/typecheck scripts were rewritten, redis mock was dropped from jest config, and the Dockerfile was edited to better support CI.
… timeouts Fixes LabsCrypt/remitlend-backend issue LabsCrypt#16. - Dockerfile: add HEALTHCHECK that probes /health on the exposed port using busybox wget (no extra packages required), with sensible interval/timeout/start-period/retries values (30s/10s/40s/3) and a defense- in-depth wget --timeout=8. - src/app.ts: wrap each /health dependency probe (DB, Redis, Soroban RPC) in a Promise.race with a per-check timeout (default 2000ms, configurable via HEALTH_CHECK_TIMEOUT_MS) so a hung dependency degrades to "error" rather than blocking the whole endpoint and starving the Docker probe. - src/__tests__/health.test.ts: add resilience tests verifying the endpoint fails fast with the right error status when a dependency hangs and that a custom HEALTH_CHECK_TIMEOUT_MS override is honored; clean up env vars via afterEach to prevent leaks across failures.
…16-dockerfile-healthcheck chore(infra): add Docker HEALTHCHECK for /health endpoint (fixes LabsCrypt#16)
chore: add CI workflow
ogazboiz
left a comment
There was a problem hiding this comment.
this is basically there, all four items from my last review are fixed: the dangling import of the deleted scheduler is gone (now uses WebhookService.processRetries()), the multi-instance lock is back (webhook_retry_scheduler:running, NX, 120s TTL, released in finally), the shared WEBHOOK_RETRY_CONFIG/RETRY_DELAYS_MS is defined once with no divergent duplicate, and MAX_RETRY_ATTEMPTS is derived (RETRY_DELAYS_MS.length + 1). just two mechanical things:
-
CI build-and-test is failing on the lint step: src/tests/distributedLock.test.ts:37:27 prettier/prettier "Delete CRLF" (a new prettier slip in the file you edited). run: npx prettier --write src/tests/distributedLock.test.ts
-
rebase onto main, you're 17 behind (predates the CI + migration merges): git fetch origin && git rebase origin/main
fix that one prettier line, rebase, and i'll merge.
if you want to keep contributing, join us on Telegram: https://t.me/+DOylgFv1jyJlNzM0
- Removed webhookRetryScheduler.ts as its logic conflicts with webhookService.ts - Extracted backoff config and max attempts into a shared config WEBHOOK_RETRY_CONFIG - Updated WebhookService.processRetries query to properly alias columns to avoid ambiguous references and to utilize the new config - Updated index.ts to wire up the surviving webhookRetryProcessor - Added test verifying retry processor respects backoff delays and max attempts Closes #123
ogazboiz
left a comment
There was a problem hiding this comment.
you're rebased now (merge-base is current main, the prior 17-behind is resolved) and all four logic items are still fixed, the lock, the derived MAX_RETRY_ATTEMPTS, the single shared config, no dangling import. the red CI check showing right now is stale (pre-rebase). just one new prettier slip:
- src/tests/distributedLock.test.ts:37 fails prettier (the await import is split across 37-38, prettier wants it on one line). run: npx prettier --write src/tests/distributedLock.test.ts then push so CI re-runs fresh.
once that pushes green i'll merge.
if you want to keep contributing, join us on Telegram: https://t.me/+DOylgFv1jyJlNzM0
|
@ogazboiz made the requested adjustment |
Description
This PR resolves the conflicting retry implementations between
webhookRetrySchedulerandwebhookService.Changes Made
webhookRetryProcessor.tsas the single surviving implementation and wired it intoindex.ts.webhookRetryScheduler.tsto remove duplicate sources of truth.WEBHOOK_RETRY_CONFIGused bywebhookService.ts.wd.id,ws.callback_url, etc.), avoiding ambiguous references and mapping correctly to the current schema.webhookRetryProcessor.test.tsasserting that the processor respects the configured backoff delays and max attempts.Closes #8