Skip to content

notDroid/HarmonyChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

224 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Harmony Chat

FastAPI Next.js Kafka PostgreSQL DynamoDB Redis Centrifugo

Harmony Chat is a fullstack chat application created for learning purposes as an exploration of distributed systems, event-driven architecture, and modern DevOps practices.

We intentionally keep the application level simple to practice the more complex underlying architecture patterns necessary for high scalability, state consistency, and real-time data streaming.

Tech Stack

  • Backend: FastAPI, SQLAlchemy, Pytest
  • Frontend: TypeScript, Next.js, React Query, Tailwind CSS, Orval
  • Infrastructure: AutoMQ Kafka, PostgreSQL, DynamoDB, Redis, Centrifugo, Conduit, Redpanda Connect

Frontend Screenshots

Frontend Chat Screen Dark Mode Frontend Chat Screen Light Mode


Architecture & Implementation Details

The system is designed around separating relational state (users, chat metadata) from high-volume, append-only data (chat messages), bound together by a fault-tolerant event-driven backbone.

Here is the logic behind our core architectural decisions:

1. Databases & Storage Strategy

  • PostgreSQL (State Source of Truth): Ideal for storing user profiles, chat metadata, and user-chat relational mappings. These domains require complex transactions and row-based locking, making Postgres the strict source of truth for the application state. We rely on Change Data Capture (CDC) to keep the rest of the system eventually consistent with this state.
  • DynamoDB (Message Storage): Chosen to store real-time chat data. Its low latency and high scalability are perfectly suited for a small message record data model that doesn't require relational joins. It serves as the long-term storage for messages and is eventually consistent with the Kafka topics that source them.

2. Log Source of Truth (Kafka)

Kafka acts as the source of truth for chat history. When a user sends a message, it is sent to kafka and considered committed after being persisted within kafka.

  • A Kafka Connect sink with a DynamoDB integration listens to this topic, strips transient author metadata from the payload, and persists the data to DynamoDB.
  • Simultaneously, Centrifugo consumes the same topic to instantly broadcast messages to active WebSocket listeners.
  • The Architectural Advantage: Kafka acts as a buffer that decouples Centrifugo, DynamoDB, and the FastAPI backend.
    • For instance if DynamoDB experiences throttling or downtime, the system maintains short-term consistency: Centrifugo will continue delivering messages to active users in real-time, while DynamoDB safely catches up later once it recovers.

3. Change Data Capture (CDC) & The Outbox Pattern

Application state changes (e.g., adding users to a chat, deleting a chat) produce side effects that must propagate through the entire system.

  • We utilize the Transactional Outbox Pattern in PostgreSQL to atomically write state changes alongside outbox events within the same transaction.
  • Debezium reads the Postgres Write-Ahead Log (WAL) and streams these outbox events into Kafka.
  • A custom Python CDC Consumer worker (harmony.consumer) processes these events with strict idempotency guarantees to update secondary storage (postgres and dynamodb) and invalidate Redis caches.
  • The Architectural Advantage:
    • Because Kafka guarantees at-least-once delivery and our consumer event handlers are strictly idempotent, this pattern guarantees the distributed system will eventually achieve consistency with the relational source of truth, avoiding the need for distributed transactions.

4. Streaming & WebSockets (Centrifugo)

We use Centrifugo to manage persistent WebSocket connections, providing a simple and highly scalable pub/sub model for frontend clients.

  • It securely authenticates connections and channel subscriptions by proxying requests back to the FastAPI backend, which completely decouples the WebSocket connection pool from the Python API instances.
  • Under the hood, it uses a Redis engine to cache user presence and recent messages, and utilizes Redis pub/sub as the broker for real-time message data.

5. Caching Layer (Redis)

We use the Cache-Aside pattern with Redis to increase response speed and decrease database load for metadata lookups.

  • We frequently need to verify user-chat membership for authorization and hydrate chat messages with user metadata. By caching this data, we dramatically increase response speeds and reduce the load on PostgreSQL.
  • Cache invalidation is integrated into the CDC event consumer to prevent stale reads when the relational state changes.

6. Backend-For-Frontend (BFF) Auth Proxy

Authentication utilizes rotating JWT access and refresh tokens.

  • The Next.js server acts as a proxy (api/proxy/{api_path}), securely attaching HTTP-only cookies to outgoing requests bound for the internal FastAPI backend.
  • We transparently handle access token management without disrupting the client experience by using middleware (proxy.ts) to resolve expired credentials and (api/refresh) for resolving HTTP 401s.
  • The backend uses a opaque string refresh tokens stored in postgres and rotates them on refresh with a very short grace period before invalidation to handle multiple refresh requests in close proximity.
  • The frontend heavily utilizes React Query for optimistic UI updates, complete with automatic rollback capabilities upon network failure.

Infrastructure Implementation

1. GitOps & Environment Decoupling

Our infrastructure deployment strictly adheres to GitOps principles, decoupling the application logic from environment-specific configurations:

  • App Repository (Environment-Agnostic): Houses the application code and publishes template logic via Git tags and GitHub Releases (container images, Terraform templates, Helmfile templates). It maintains zero awareness of downstream configurations.
  • Config Repository (Environment-Specific): Contains environment variables, SOPS-encrypted secrets, and references to specific App Repo Git tags to trigger deployments.

The CI/CD Pipeline: We utilize a high-level values.yaml and secrets.yaml to control the entire deployment. Rendered manifests are published as an OCI Artifact using a Helmfile template, carefully decoupling secrets and dynamic Terraform outputs via AWS Secrets Manager and Parameter Store. ArgoCD then continuously syncs this artifact to the Kubernetes cluster.

2. Kubernetes Provisioning & Templating

  • Infrastructure as Code (IaC): We currently use Terragrunt to parse, decrypt, and bootstrap the underlying AWS infrastructure, including the EKS cluster. (Note: We are planning a migration to Spacelift to fully automate infrastructure creation).
  • Helmfile & Chart Management: We use Helmfile to facilitate multi-layer templating. We maintain custom Helm charts for our core application, Conduit, and utility specifications (like Karpenter and external secrets).
  • Autoscaling: We utilize Karpenter for dynamic, node-level cluster autoscaling. At the pod level, Horizontal Pod Autoscaling (HPA) is built into all stateless components.

3. Event Streaming Infrastructure (AutoMQ)

Instead of deploying a traditional shared-nothing Kafka architecture (like Apache or Redpanda) or relying on costly AWS MSK clusters, we utilize AutoMQ for our Kafka workloads.

The Architectural Advantage: AutoMQ utilizes a compute-storage decoupled model backed by AWS S3, providing several critical benefits for a Kubernetes environment:

  • Painless Scaling: Because it uses a shared storage backend (S3), we can scale brokers vertically or horizontally without suffering the massive network overhead of partition reassignment. Data never needs to be copied over the network after it is committed.
  • Optimized Durability vs. Latency: AutoMQ maintains low broker commit latency by writing to a persistent WAL buffer (an EBS volume) to achieve immediate single-AZ durability. It then batches these writes to S3 for long-term, high durability guarantees. While this trades off immediate multi-AZ consistency on commit, it completely bypasses multi-AZ network latency while maintaining excellent data safety.

Local Development & Setup

Dependencies

We use devbox to manage all development dependencies. Use devbox shell to download and enter the devbox environment. We use task to manage all development commands, which are defined in the Taskfile.yaml at the root of the repository

Initial Setup

Download pyton and node dependencies for both the frontend and backend by running (uv sync and npm install):

task setup

Run the Application with Docker Compose

task run:dev

Run the Application with Kind

task kind:setup

Explore Available Tasks

To see a full list of available development tasks (like running tests, generating fake data, compiling environments, or syncing OpenAPI schemas):

task

Roadmap & TODOs

  • Implement repo wrappers to interact with DynamoDB tables
  • Implement simple poll-based FastAPI backend
  • Implement simple frontend with Next.js
  • Implement backend endpoints for chat and user management
  • Implement authentication with access token JWTs
  • Use the Backend-For-Frontend pattern proxying API requests to securely set HTTP-only cookies
  • Implement pub/sub model for real-time chat updates
  • Use React Query in the frontend for optimistic server-side hydration and client-side fallback
  • Implement ULID cursor-based pagination for chat history
  • Migrate to Postgres storage for metadata, relying on transactions and row-based locking
  • Implement refresh tokens with rotation and automatic resolution of expired access tokens
  • Implement Redis caching layer for metadata and membership authorization
  • Migrate WebSocket handling to Centrifugo + Redis
  • Implement Kafka-based event sourcing for chat messages with Kafka Connect DynamoDB sink
  • Implement CDC (Change Data Capture) with Postgres Outbox + Debezium + Kafka
  • Build k8s set up with helm and deploy locally using kind.
  • Switch from Apache Kafka to AutoMQ, from Debezium to Conduit, from Kafka Connect to Redpanda Connect
  • Deploy simplified application to AWS via Terraform
  • Setup GitHub Actions CI
  • Setup argoCD
  • Upgrade k8s setup with API Gateway and Autoscaling
  • Launch production deployment

About

πŸ’¬ A full-stack, event-driven chat application built to learn distributed systems and devops patterns.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors