Skip to content

mollanegash/high-throughput-text-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

High-Throughput Text Processing Engine

Java 21 + Spring Boot 3 | Concurrent Processing (Parallel Streams) | Stateless Backend | Docker | Terraform | Live on Render

Live URL:
https://federal-regulations-analyzer-fahp.onrender.com


Overview

A cloud-native backend application for ingesting and analyzing regulatory text data from public APIs. Built with Java 21 and Spring Boot 3, the system focuses on concurrent processing, performance optimization, and stateless architecture design.

The application processes regulatory agency data, performs text analysis, and persists results in an embedded H2 database. It is designed with a PostgreSQL-compatible schema for future migration.


Architecture

Client → Spring Boot API → Processing Layer → H2 Database (stateless service)

Key design principles:

  • Stateless backend service enabling horizontal scalability
  • No session dependency between requests

Engineering Highlights

  • Concurrent data ingestion using Java Parallel Streams
  • REST API integration with eCFR public data source
  • SHA-256 checksum generation for data integrity validation
  • Text analytics (word count, regulatory density scoring)
  • Stateless Spring Boot architecture
  • JVM tuning for containerized environments
  • Multi-stage Docker builds (380MB → 89MB)
  • Terraform infrastructure configuration
  • GitHub Actions CI/CD pipeline
  • Live deployment on Render

Performance & Load Testing

The system was evaluated using Apache JMeter under controlled load testing.

  • Tool: Apache JMeter
  • Deployment: Single-instance Render environment
  • Concurrency: (define your thread count if needed)
  • Test type: sustained load test
  • Load testing performed using Apache JMeter (single-instance Render deployment, controlled concurrency)

Results:

  • Throughput: up to ~5,000 requests/sec
  • p99 latency: ~118ms
  • Stable processing under sustained concurrent load

Note: Results reflect single-node deployment using Java Parallel Streams and H2 embedded database.


Data Storage

  • Current: H2 embedded database (development / lightweight production)
  • Future-ready: PostgreSQL-compatible schema design

Technology Stack

Layer Technology
Backend Java 21, Spring Boot 3
Database H2
Infrastructure Docker, Terraform
CI/CD GitHub Actions
Deployment Render

Key Features

  • High-concurrency data processing
  • External API ingestion (eCFR)
  • Text analytics engine
  • SHA-256 validation pipeline
  • Stateless backend design
  • Containerized deployment
  • Infrastructure as Code (IaC)
  • Automated CI/CD pipeline

What This Project Demonstrates

  • Concurrent programming in Java
  • Backend system design fundamentals
  • Performance optimization techniques
  • REST API integration
  • Cloud deployment workflows
  • Containerization with Docker
  • Infrastructure automation (Terraform)
  • Load testing with JMeter

How to Run Locally

docker build -t text-processor . docker run -p 8080:8080 text-processor

Run with Maven

./mvnw spring-boot:run

About

Java 21 + Spring Boot 3 backend system for concurrent regulatory text processing | Stateless architecture | H2 database | JMeter load testing | Docker + Terraform | Live on Render

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors