From 810c9794c5c4bb43dc1a6204ce6e243b27eea43f Mon Sep 17 00:00:00 2001 From: Shantanu Mane Date: Wed, 17 Jun 2026 08:21:18 +0530 Subject: [PATCH] =?UTF-8?q?docs(readme):=20refresh=20for=20v1.0.0=20?= =?UTF-8?q?=E2=80=94=20storage,=20config,=20releases?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bring the README in line with the current codebase: - S3/MinIO storage is implemented (Go + worker), not "coming soon" - Correct env vars: BUCKET_PROVIDER/S3_*, REDIS_CONNECTION_STRING, required ENCRYPTION_KEY (32 bytes), HOST/PORT, AUTO_MIGRATE - Drop non-existent docker-compose.yml and requirements.txt steps; document GHCR images + per-dockerfile builds - Add Observability and Releases sections; document staging vs LTS tags - Update project structure (storagex, observability, worker/storage factory) - Fix test commands (task test / pytest worker/tests), contributing flow Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 227 +++++++++++++++++++++++++++++------------------------- 1 file changed, 121 insertions(+), 106 deletions(-) diff --git a/README.md b/README.md index a68500b..4af88f1 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@ # MPiper 🎬 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +[![Release](https://img.shields.io/badge/release-v1.0.0--lts-brightgreen.svg)](https://github.com/rndmcodeguy20/mpiper/releases/tag/v1.0.0) [![Go Version](https://img.shields.io/badge/Go-1.24-blue.svg)](https://golang.org/) [![Python Version](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/) @@ -9,20 +10,23 @@ A lightweight, scalable media processing pipeline built with Go and Python. MPip ## 🌟 Features - **RESTful API Server** - High-performance Go server built with Chi router -- **Asynchronous Processing** - Redis-based job queue for scalable media processing -- **Multi-Cloud Storage** - Support for Google Cloud Storage (GCS) and AWS S3 -- **Image Processing** - Automatic generation of optimized image variants (thumbnails, different formats) -- **Video Processing** - Video transcoding and optimization -- **Database-Backed** - PostgreSQL for reliable metadata and job tracking -- **Docker Ready** - Containerized deployment with Kubernetes support -- **Production Ready** - Structured logging, error handling, and recovery middleware +- **Asynchronous Processing** - Redis Streams job queue for scalable media processing +- **Pluggable Storage** - GCS and S3/MinIO (any S3-compatible store) behind a single provider abstraction, selected by config +- **Image Processing** - Automatic generation of optimized, content-addressed image variants (resize, re-encode, format conversion) +- **Video Processing** - Poster generation, 720p transcode, and preview clips +- **Database-Backed** - PostgreSQL as the durable source of truth for assets, variants, and jobs +- **Webhooks** - Registration and delivery tracking tables for outbound event notifications +- **Observability** - OpenTelemetry tracing + metrics on the API, Prometheus metrics on the worker, with a bundled Grafana/Tempo/Loki/Prometheus stack +- **Docker & Kubernetes Ready** - Multi-stage images and manifests for containerized deployment ## πŸ—οΈ Architecture +Two-service pipeline communicating over **Redis Streams** (`media:jobs`). PostgreSQL is the durable source of truth; Redis is transport-only. + ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Client │────────▢│ Go API │────────▢│ Redis β”‚ -β”‚ β”‚ β”‚ Server β”‚ β”‚ Queue β”‚ +β”‚ β”‚ β”‚ Server β”‚ β”‚ Streams β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β–Ό β–Ό @@ -33,18 +37,18 @@ A lightweight, scalable media processing pipeline built with Go and Python. MPip β”‚ β”‚ β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” - β”‚ Cloud Storage (GCS/S3) β”‚ + β”‚ Object Storage (GCS / S3 / MinIO)β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Flow:** -1. Client uploads media via REST API -2. Go server generates signed upload URL and creates job -3. Client uploads directly to cloud storage -4. Job is queued in Redis -5. Python worker processes media (resize, transcode, optimize) -6. Variants are stored back to cloud storage -7. Database is updated with asset status and metadata +1. Client requests an upload via the REST API +2. Go server creates the asset + job and returns a presigned upload URL +3. Client uploads the raw file directly to object storage +4. Client marks the asset uploaded; the job is enqueued on the Redis stream +5. Python worker consumes the job, processes media (resize, transcode, optimize) +6. Variants are written back to object storage (deduplicated by content hash) +7. Database is updated with asset status and variant metadata ## πŸ“‹ Prerequisites @@ -53,7 +57,7 @@ A lightweight, scalable media processing pipeline built with Go and Python. MPip - **PostgreSQL** 12 or higher - **Redis** 6 or higher - **Task** (optional, for build automation) - [Installation guide](https://taskfile.dev/installation/) -- Cloud storage account (GCS or AWS S3) +- Object storage: a GCS bucket, or any S3-compatible store (AWS S3 / **MinIO** for fully-local runs) ## πŸš€ Quick Start @@ -66,13 +70,16 @@ cd mpiper ### 2. Configure Environment -Create a `.env` file in the project root: +Create a `.env.local` file in the project root (`development` β†’ `.env.local`, `staging` β†’ `.env.staging`, `production` β†’ `.env`). + +`ENV`, `DB_USER`, `DB_PASSWORD`, `DB_NAME`, `REDIS_CONNECTION_STRING`, and `ENCRYPTION_KEY` (**exactly 32 bytes**) are required β€” the config panics without them. ```env -# Server Configuration -SERVER_HOST=localhost -SERVER_PORT=8080 +# Server ENV=development +HOST=0.0.0.0 +PORT=8080 +LOG_LEVEL=DEBUG # Database DB_HOST=localhost @@ -80,32 +87,43 @@ DB_PORT=5432 DB_USER=postgres DB_PASSWORD=your_password DB_NAME=mpiper -DB_SSLMODE=disable +DB_SSL_MODE=false +AUTO_MIGRATE=true # run embedded SQL migrations on startup + +# Redis (transport for the job stream) +REDIS_CONNECTION_STRING=redis://localhost:6379/0 + +# Security (must be exactly 32 bytes) +ENCRYPTION_KEY=change_me_to_a_32_byte_secret____ -# Redis -REDIS_HOST=localhost -REDIS_PORT=6379 -REDIS_PASSWORD= -REDIS_DB=0 +# Storage β€” pick a provider +BUCKET_PROVIDER=gcs # gcs | s3 +BUCKET_NAME=your-bucket-name -# Storage (GCS) -STORAGE_PROVIDER=gcp -GCS_BUCKET=your-bucket-name -GCS_CREDENTIALS_PATH=.secrets/service-account.json +# GCS provider +GCS_SA_PATH=.secrets/service-account.json + +# S3 / MinIO provider (used when BUCKET_PROVIDER=s3) +S3_BUCKET_NAME=your-bucket-name +S3_REGION=us-east-1 +S3_ACCESS_KEY_ID=your-access-key +S3_SECRET_ACCESS_KEY=your-secret-key +S3_ENDPOINT_URL=http://localhost:9000 # set for MinIO / S3-compatible stores # Worker -TEMP_DIR=/tmp/mpiper STREAM_NAME=media:jobs JOB_POLL_INTERVAL=1 +MAX_CONCURRENT_JOBS=5 ``` -### 3. Set Up Database +> The worker reads the same `S3_*` variables as the Go server (falling back to `BUCKET_*`), so one `.env` drives both services. + +### 3. Set Up the Database + +Migrations run automatically on startup when `AUTO_MIGRATE=true` β€” both the Go server and the Python worker apply the embedded SQL migrations. To apply them manually instead: ```bash -# Create database createdb mpiper - -# Run migrations psql -d mpiper -f db/migrations/001_seed.sql ``` @@ -116,37 +134,27 @@ psql -d mpiper -f db/migrations/001_seed.sql go mod download ``` -**Python Worker:** +**Python Worker** (managed with Poetry): ```bash -pip install poetry +pipx install poetry # or: pip install poetry poetry install ``` -Or using pip directly: -```bash -pip install -r requirements.txt -``` - ### 5. Run the Services **Option A: Using Task (Recommended)** ```bash -# Run API server -task dev +task dev # API server (ENV=development, hot-reload via `task run`) -# Run worker (in another terminal) -poetry run python -m worker +poetry run python -m worker # worker, in another terminal ``` **Option B: Manual** ```bash -# Run API server -go run cmd/server/main.go - -# Run worker -python -m worker +go run cmd/server/main.go # API server +python -m worker # worker ``` ### 6. Test the API @@ -163,23 +171,27 @@ curl -X POST http://localhost:8080/api/v1/assets/upload \ ## 🐳 Docker Deployment -### Build Images +### Pull the published image (GHCR) -```bash -# Build API server -docker build -t mpiper-api:latest -f deploy/docker/mpiper.dockerfile . +LTS images are published to the GitHub Container Registry: -# Build worker -docker build -t mpiper-worker:latest -f deploy/docker/worker.dockerfile . +```bash +docker pull ghcr.io/rndmcodeguy20/mpiper:lts # latest LTS +docker pull ghcr.io/rndmcodeguy20/mpiper:1.0.0-lts # pinned LTS +docker pull ghcr.io/rndmcodeguy20/mpiper:staging # latest staging build ``` -### Run with Docker Compose +### Build locally ```bash -docker-compose up -d +# API server +docker build -t mpiper-api:latest -f deploy/docker/mpiper.dockerfile . + +# Worker +docker build -t mpiper-worker:latest -f deploy/docker/worker.dockerfile . ``` -### Kubernetes Deployment +### Kubernetes ```bash kubectl apply -f deploy/k8s/ @@ -203,18 +215,20 @@ kubectl apply -f deploy/k8s/ **Response:** ```json { - "uploadUrl": "https://storage.googleapis.com/...", + "uploadUrl": "https:///...", "assetId": "550e8400-e29b-41d4-a716-446655440000", "method": "PUT", "headers": { "Content-Type": "image/jpeg" }, "objectPath": "media/raw/550e8400-e29b-41d4-a716-446655440000", - "publicUrl": "https://storage.googleapis.com/...", + "publicUrl": "https:///...", "expiresAt": 1702468800 } ``` +> The `uploadUrl` / `publicUrl` host depends on the configured storage provider (GCS, S3, or a MinIO endpoint). + ### Mark Asset as Uploaded **Endpoint:** `POST /api/v1/assets/{assetId}/uploaded` @@ -236,28 +250,31 @@ mpiper/ β”œβ”€β”€ cmd/ β”‚ └── server/ # API server entry point β”œβ”€β”€ internal/ -β”‚ β”œβ”€β”€ config/ # Configuration management -β”‚ β”œβ”€β”€ database/ # Database connections +β”‚ β”œβ”€β”€ config/ # Configuration management (env-driven singleton) +β”‚ β”œβ”€β”€ database/ # Postgres pool + embedded migrations β”‚ β”œβ”€β”€ handler/ # HTTP handlers +β”‚ β”œβ”€β”€ metrics/ # OTel metric instruments + provider init β”‚ β”œβ”€β”€ middleware/ # HTTP middleware -β”‚ β”œβ”€β”€ models/ # Data models -β”‚ β”œβ”€β”€ queue/ # Redis queue implementation -β”‚ β”œβ”€β”€ repository/ # Database repositories -β”‚ β”œβ”€β”€ router/ # Route definitions +β”‚ β”œβ”€β”€ models/ # Request/response models +β”‚ β”œβ”€β”€ queue/ # Redis Streams producer +β”‚ β”œβ”€β”€ repository/ # SQL repositories (sqlx) +β”‚ β”œβ”€β”€ router/ # Route registration β”‚ β”œβ”€β”€ server/ # Server setup β”‚ └── service/ # Business logic β”œβ”€β”€ pkg/ -β”‚ β”œβ”€β”€ errors/ # Error handling -β”‚ └── utils/ # Utility functions +β”‚ β”œβ”€β”€ errors/ # Typed API errors +β”‚ └── utils/ +β”‚ └── storagex/ # Storage abstraction (GCS, S3/MinIO) β”œβ”€β”€ worker/ -β”‚ β”œβ”€β”€ consumer/ # Job consumer -β”‚ β”œβ”€β”€ processing/ # Media processing logic -β”‚ β”œβ”€β”€ storage/ # Storage adapters -β”‚ └── utils/ # Worker utilities +β”‚ β”œβ”€β”€ consumer/ # Redis Streams consumer + config +β”‚ β”œβ”€β”€ processing/ # Image/video processing +β”‚ β”œβ”€β”€ storage/ # Storage adapters (base ABC, GCS, S3) + factory +β”‚ └── utils/ # Worker utilities (metrics) β”œβ”€β”€ db/ β”‚ └── migrations/ # SQL migrations +β”œβ”€β”€ observability/ # OTel collector + Grafana/Tempo/Loki/Prometheus └── deploy/ - β”œβ”€β”€ docker/ # Docker files + β”œβ”€β”€ docker/ # Dockerfiles (mpiper, worker) └── k8s/ # Kubernetes manifests ``` @@ -265,12 +282,14 @@ mpiper/ **Go tests:** ```bash -go test ./... +task test # gotestsum +task test -- ./internal/... # specific package +task test-coverage # generates coverage.html ``` **Python tests:** ```bash -poetry run pytest +poetry run pytest worker/tests/ ``` ### Build for Production @@ -287,41 +306,46 @@ CGO_ENABLED=0 go build -ldflags="-w -s" -o build/mpiper cmd/server/main.go ### Server Configuration -The server can be configured via environment variables or a configuration file. See [`internal/config/env.go`](internal/config/env.go) for all available options. +The server is configured via environment variables. See [`internal/config/env.go`](internal/config/env.go) for all available options; worker options live in [`worker/consumer/config.py`](worker/consumer/config.py). ### Storage Providers -MPiper supports multiple cloud storage providers: +MPiper selects a storage backend via `BUCKET_PROVIDER`: -- **Google Cloud Storage (GCS)** - Default, recommended for production -- **AWS S3** - Coming soon -- **Azure Blob Storage** - Coming soon +- **Google Cloud Storage (GCS)** - set `GCS_SA_PATH` to a service-account key +- **AWS S3 / S3-compatible (MinIO)** - set the `S3_*` variables; `S3_ENDPOINT_URL` switches the client to path-style addressing for MinIO and other S3-compatible stores +- **Azure Blob Storage** - planned -### Worker Configuration +Both the Go API and the Python worker share the same provider selection and env vars, so a single configuration drives the whole pipeline. -Configure worker behavior in `worker/consumer/config.py`: +### Observability -- Processing pipelines (image/video) -- Variant generation rules -- Storage destinations -- Concurrency settings +The API emits OpenTelemetry traces and metrics; the worker exposes Prometheus metrics. The `observability/` directory contains a ready-to-run collector plus Grafana, Tempo, Loki, and Prometheus configuration. + +## πŸ“¦ Releases + +MPiper uses a two-track build pipeline: + +- **Staging** β€” every push to `staging` builds and pushes images tagged `{version}`, `{version}-{sha}`, `{sha}`, and `staging`. +- **LTS** β€” every push to `master` builds the production long-term-support images tagged `lts`, `{version}-lts`, and `{sha}-lts`. + +The version is sourced from the [`.version`](.version) file and embedded into the binary via ldflags (`main.Version`). **v1.0.0** is the initial LTS release β€” see [Releases](https://github.com/rndmcodeguy20/mpiper/releases). ## 🀝 Contributing -Contributions are welcome! Please follow these steps: +Contributions are welcome! Development happens on `staging`; `master` holds stable LTS releases. 1. Fork the repository -2. Create a feature branch (`git checkout -b feature/amazing-feature`) -3. Commit your changes (`git commit -m 'Add amazing feature'`) -4. Push to the branch (`git push origin feature/amazing-feature`) -5. Open a Pull Request +2. Create a feature branch off `staging` (`git checkout -b feat/amazing-feature`) +3. Commit your changes +4. Push the branch and open a Pull Request **against `staging`** ### Development Guidelines - Write tests for new features - Follow Go and Python best practices - Update documentation as needed -- Ensure all tests pass before submitting PR +- Ensure all tests pass before submitting a PR ## πŸ“ License @@ -343,10 +367,10 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file ## πŸ“Š Roadmap -- [ ] Support for AWS S3 storage +- [x] Support for AWS S3 / MinIO storage +- [x] Webhook delivery tracking (schema) - [ ] Support for Azure Blob Storage - [ ] Video transcoding with FFmpeg -- [ ] Webhook notifications - [ ] Admin dashboard - [ ] Batch processing API - [ ] CDN integration @@ -357,15 +381,6 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file Please use the [GitHub Issues](https://github.com/rndmcodeguy20/mpiper/issues) page to report bugs or request features. -## πŸ“š Additional Resources - -- [Go Documentation](https://golang.org/doc/) -- [Python Documentation](https://docs.python.org/) -- [PostgreSQL Documentation](https://www.postgresql.org/docs/) -- [Redis Documentation](https://redis.io/documentation) -- [Task Documentation](https://taskfile.dev/) - --- Made with ❀️ by [Shantanu Mane](https://github.com/rndmcodeguy20) -