Skip to content

FEAT: Challenge 1 – Payment Settlement Pipeline#14

Open
EdwinLRG wants to merge 1 commit intoyaperos:mainfrom
EdwinLRG:challenge/EdwinR
Open

FEAT: Challenge 1 – Payment Settlement Pipeline#14
EdwinLRG wants to merge 1 commit intoyaperos:mainfrom
EdwinLRG:challenge/EdwinR

Conversation

@EdwinLRG
Copy link
Copy Markdown

PR Description: Challenge 1 – Payment Settlement Pipeline

  1. The chosen challenge and why

I chose Challenge 1 because I believe payment processing is the heart of a fintech company. This challenge allows me to demonstrate how to design a system that not only "works" but is also resilient and financially secure, handling real-world problems such as eventual consistency and fault tolerance in distributed systems.

  1. Architectural decisions and rejected alternatives

Transactional Outbox for reliable delivery:
• My decision: I implemented the Transactional Outbox pattern. I store the payment and the exit event in the same PostgreSQL transaction. A separate Relay process handles posting to Kafka.
• What I rejected: Posting directly to Kafka within the payment service using @transaction().
• Why: Doing so couples the database with the broker. If the message is sent to Kafka but the database transaction fails (rollback), we would have a "phantom message" being processed without an actual record of the payment. My approach ensures that only what was actually persisted is reported.

Idempotence in Consumers:
• My decision: I designed the consumers to be idempotent using a control table (processed_events) with a unique key composed of the eventId and the consumerName.
• What I rejected: Relying on Kafka not sending duplicates or on the producer's idempotence.
• Why: In distributed systems, redelivery is inevitable. By ensuring idempotence in the consumer, I guarantee that there will be no double charges or balance corruption even if the message arrives multiple times.

Error Handling with DLT:
• My decision: I implemented a Dead Letter Topic (DLT). If a message fails after 3 retries with exponential waiting, I move it to the error topic for manual analysis or reprocessing.
• What I rejected: Infinite retries or ignoring the error.
• Why: A corrupted message (poison pill) should not block the operation of other components.

  1. What I would do differently with more time
    • Observability: I would integrate a tracing system (OpenTelemetry) to track a payment from when it enters the API until it reaches the DLT in case of an error.
    • Governance: I would use a Schema Registry to ensure that any changes to the message structure are compatible with services already in production.

  2. Limitations and shortcuts taken
    • Infrastructure: I used a single-node configuration for Kafka and Postgres in Docker to facilitate local execution of the challenge.
    • Scaffolding: The Fraud and Notification services are simplified (log-based), allowing me to focus on the robustness of the event architecture.
    • Schemas: I enabled synchronize: true in the ORM to expedite review, although in a production environment, I would strictly use versioned migrations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant