Skip to content

thisis-Shitanshu/hybrid-signalforge

Repository files navigation

Hybrid SignalForge

Write-up: Hybrid AI Data Platform for Sensor and RF Pipelines

Overview

Hybrid SignalForge is an end-to-end AI data platform for RF/sensor pipelines. It generates synthetic baseband I/Q windows with impairments (SNR, CFO, phase), versions datasets in lakeFS, runs Spark feature engineering, trains a modulation classifier in MLflow, monitors drift with Evidently, and serves predictions via FastAPI. The stack is Docker Compose–first and can later switch to AWS S3 with minimal changes.

What this project delivers

  • Versioned datasets with lakeFS branches and commits
  • Reproducible orchestration in Airflow (ingest → quality → features → train → drift/merge)
  • Spark-based feature engineering
  • MLflow experiment tracking + model registry with Production promotion
  • Drift metrics exported to Prometheus and Grafana
  • A FastAPI service plus a small Python SDK for consumption

Local UIs and endpoints

Project layout

HYBRID-SIGNALFORGE/
├── airflow/
│   └── dags/
│       └── signalforge_pipeline.py
├── api/
│   ├── Dockerfile
│   └── app.py
├── docker/
│   ├── airflow/
│   │   └── Dockerfile
│   ├── grafana/
│   │   ├── dashboards/
│   │   └── provisioning/
│   │       ├── dashboards/
│   │       │   └── dashboard.yml
│   │       └── datasources/
│   │           └── datasource.yml
│   ├── mlflow/
│   │   └── Dockerfile
│   ├── postgres/
│   │   └── init.sql
│   ├── prometheus/
│   │   └── prometheus.yml
│   └── spark/
│       └── Dockerfile
├── spark_jobs/
│   └── feature_engineering.py
├── src/
│   └── signalforge/
│       ├── __init__.py
│       ├── config.py
│       ├── data_gen.py
│       ├── drift.py
│       ├── feature_extract.py
│       ├── lakefs_bootstrap.py
│       ├── lakefs_ops.py
│       ├── quality.py
│       └── train.py
├── .env
├── .env.example
├── .gitignore
├── docker-compose.airflow.yml
├── docker-compose.base.yml
├── docker-compose.obs.yml
├── docker-compose.spark.yml
├── docker-compose.yml
├── README.md
├── requirements.airflow.txt
├── requirements.api.txt
├── requirements.mlflow.txt
└── requirements.spark.txt

Prerequisites

  • Docker Engine + Docker Compose v2
  • Python 3.11+
  • Make (optional)

Quick start

1) Configure environment

cp .env.example .env

2) Start the stack

docker compose up -d --build

3) Initialize Airflow (one-time)

docker compose run --rm airflow-init

4) Bootstrap lakeFS

make bootstrap

Without Make:

docker compose exec airflow-webserver python -m signalforge.lakefs_bootstrap

5) Trigger the pipeline

make trigger

Without Make:

docker compose exec airflow-webserver airflow dags trigger signalforge_end_to_end

Verification checklist

Airflow

  1. Open http://localhost:8080
  2. Login: admin / admin
  3. DAG: signalforge_end_to_end is green

lakeFS

  1. Open http://localhost:8000
  2. Login with access key/secret from .env
  3. Repo shows branches, commits, and merge into main

MLflow

  1. Open http://localhost:5000
  2. Experiment signalforge_modulation has runs
  3. Registered model rf_modulation_classifier is in Production

API

Health:

curl -s http://localhost:9000/health

Model:

curl -s http://localhost:9000/model

Predict:

curl -s -X POST http://localhost:9000/predict \
  -H "Content-Type: application/json" \
  -d '{"i":[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,0.9,0.8,0.7,0.6,0.5,0.4],"q":[0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1]}'

Monitoring

Design notes and tradeoffs

  • Spark submission runs inside Docker for local simplicity. In production, use Spark on Kubernetes or a managed Spark service and submit via operator/API.
  • Drift is computed against a baseline run ID for clarity. Production setups usually track baselines in metadata and compare rolling windows.
  • Governance is lightweight (lakeFS commits + stored quality/drift reports). This keeps the system lean but audit-friendly.

Useful commands

Start or rebuild:

docker compose up -d --build

Logs:

docker compose logs -f airflow-scheduler airflow-webserver

Stop (keep data):

docker compose down

Full reset:

docker compose down -v --rmi local --remove-orphans

About

A mini "internal AI platform" for test-and-measurement style sensor/RF data.

Topics

Resources

Stars

Watchers

Forks

Contributors