High-Throughput Text Processing Engine

Live URL:
https://federal-regulations-analyzer-fahp.onrender.com

Overview

A cloud-native backend application for ingesting and analyzing regulatory text data from public APIs. Built with Java 21 and Spring Boot 3, the system focuses on concurrent processing, performance optimization, and stateless architecture design.

The application processes regulatory agency data, performs text analysis, and persists results in an embedded H2 database. It is designed with a PostgreSQL-compatible schema for future migration.

Architecture

Client → Spring Boot API → Processing Layer → H2 Database (stateless service)

Key design principles:

Stateless backend service enabling horizontal scalability
No session dependency between requests

Engineering Highlights

Concurrent data ingestion using Java Parallel Streams
REST API integration with eCFR public data source
SHA-256 checksum generation for data integrity validation
Text analytics (word count, regulatory density scoring)
Stateless Spring Boot architecture
JVM tuning for containerized environments
Multi-stage Docker builds (380MB → 89MB)
Terraform infrastructure configuration
GitHub Actions CI/CD pipeline
Live deployment on Render

Performance & Load Testing

The system was evaluated using Apache JMeter under controlled load testing.

Tool: Apache JMeter
Deployment: Single-instance Render environment
Concurrency: (define your thread count if needed)
Test type: sustained load test
Load testing performed using Apache JMeter (single-instance Render deployment, controlled concurrency)

Results:

Throughput: up to ~5,000 requests/sec
p99 latency: ~118ms
Stable processing under sustained concurrent load

Note: Results reflect single-node deployment using Java Parallel Streams and H2 embedded database.

Data Storage

Current: H2 embedded database (development / lightweight production)
Future-ready: PostgreSQL-compatible schema design

Technology Stack

Layer	Technology
Backend	Java 21, Spring Boot 3
Database	H2
Infrastructure	Docker, Terraform
CI/CD	GitHub Actions
Deployment	Render

Key Features

High-concurrency data processing
External API ingestion (eCFR)
Text analytics engine
SHA-256 validation pipeline
Stateless backend design
Containerized deployment
Infrastructure as Code (IaC)
Automated CI/CD pipeline

What This Project Demonstrates

Concurrent programming in Java
Backend system design fundamentals
Performance optimization techniques
REST API integration
Cloud deployment workflows
Containerization with Docker
Infrastructure automation (Terraform)
Load testing with JMeter

How to Run Locally

docker build -t text-processor . docker run -p 8080:8080 text-processor

Run with Maven

./mvnw spring-boot:run

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.mvn/wrapper		.mvn/wrapper
data		data
src		src
terraform		terraform
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Throughput Text Processing Engine

Overview

Architecture

Engineering Highlights

Performance & Load Testing

Data Storage

Technology Stack

Key Features

What This Project Demonstrates

How to Run Locally

Run with Maven

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

High-Throughput Text Processing Engine

Overview

Architecture

Engineering Highlights

Performance & Load Testing

Data Storage

Technology Stack

Key Features

What This Project Demonstrates

How to Run Locally

Run with Maven

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages