Hybrid SignalForge

Write-up: Hybrid AI Data Platform for Sensor and RF Pipelines

Overview

Hybrid SignalForge is an end-to-end AI data platform for RF/sensor pipelines. It generates synthetic baseband I/Q windows with impairments (SNR, CFO, phase), versions datasets in lakeFS, runs Spark feature engineering, trains a modulation classifier in MLflow, monitors drift with Evidently, and serves predictions via FastAPI. The stack is Docker Compose–first and can later switch to AWS S3 with minimal changes.

What this project delivers

Versioned datasets with lakeFS branches and commits
Reproducible orchestration in Airflow (ingest → quality → features → train → drift/merge)
Spark-based feature engineering
MLflow experiment tracking + model registry with Production promotion
Drift metrics exported to Prometheus and Grafana
A FastAPI service plus a small Python SDK for consumption

Local UIs and endpoints

lakeFS UI: http://localhost:8000
Airflow UI: http://localhost:8080
MLflow UI: http://localhost:5000
FastAPI docs: http://localhost:9000/docs
Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin/admin)

Project layout

HYBRID-SIGNALFORGE/
├── airflow/
│   └── dags/
│       └── signalforge_pipeline.py
├── api/
│   ├── Dockerfile
│   └── app.py
├── docker/
│   ├── airflow/
│   │   └── Dockerfile
│   ├── grafana/
│   │   ├── dashboards/
│   │   └── provisioning/
│   │       ├── dashboards/
│   │       │   └── dashboard.yml
│   │       └── datasources/
│   │           └── datasource.yml
│   ├── mlflow/
│   │   └── Dockerfile
│   ├── postgres/
│   │   └── init.sql
│   ├── prometheus/
│   │   └── prometheus.yml
│   └── spark/
│       └── Dockerfile
├── spark_jobs/
│   └── feature_engineering.py
├── src/
│   └── signalforge/
│       ├── __init__.py
│       ├── config.py
│       ├── data_gen.py
│       ├── drift.py
│       ├── feature_extract.py
│       ├── lakefs_bootstrap.py
│       ├── lakefs_ops.py
│       ├── quality.py
│       └── train.py
├── .env
├── .env.example
├── .gitignore
├── docker-compose.airflow.yml
├── docker-compose.base.yml
├── docker-compose.obs.yml
├── docker-compose.spark.yml
├── docker-compose.yml
├── README.md
├── requirements.airflow.txt
├── requirements.api.txt
├── requirements.mlflow.txt
└── requirements.spark.txt

Prerequisites

Docker Engine + Docker Compose v2
Python 3.11+
Make (optional)

Quick start

1) Configure environment

cp .env.example .env

2) Start the stack

docker compose up -d --build

3) Initialize Airflow (one-time)

docker compose run --rm airflow-init

4) Bootstrap lakeFS

make bootstrap

Without Make:

docker compose exec airflow-webserver python -m signalforge.lakefs_bootstrap

5) Trigger the pipeline

make trigger

Without Make:

docker compose exec airflow-webserver airflow dags trigger signalforge_end_to_end

Verification checklist

Airflow

Open http://localhost:8080
Login: admin / admin
DAG: signalforge_end_to_end is green

lakeFS

Open http://localhost:8000
Login with access key/secret from .env
Repo shows branches, commits, and merge into main

MLflow

Open http://localhost:5000
Experiment signalforge_modulation has runs
Registered model rf_modulation_classifier is in Production

API

Health:

curl -s http://localhost:9000/health

Model:

curl -s http://localhost:9000/model

Predict:

curl -s -X POST http://localhost:9000/predict \
  -H "Content-Type: application/json" \
  -d '{"i":[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,0.9,0.8,0.7,0.6,0.5,0.4],"q":[0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1]}'

Monitoring

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin/admin)

Design notes and tradeoffs

Spark submission runs inside Docker for local simplicity. In production, use Spark on Kubernetes or a managed Spark service and submit via operator/API.
Drift is computed against a baseline run ID for clarity. Production setups usually track baselines in metadata and compare rolling windows.
Governance is lightweight (lakeFS commits + stored quality/drift reports). This keeps the system lean but audit-friendly.

Useful commands

Start or rebuild:

docker compose up -d --build

Logs:

docker compose logs -f airflow-scheduler airflow-webserver

Stop (keep data):

docker compose down

Full reset:

docker compose down -v --rmi local --remove-orphans

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hybrid SignalForge

Overview

What this project delivers

Local UIs and endpoints

Project layout

Prerequisites

Quick start

1) Configure environment

2) Start the stack

3) Initialize Airflow (one-time)

4) Bootstrap lakeFS

5) Trigger the pipeline

Verification checklist

Airflow

lakeFS

MLflow

API

Monitoring

Design notes and tradeoffs

Useful commands

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
airflow/dags		airflow/dags
api		api
docker		docker
sdk		sdk
spark_jobs		spark_jobs
src/signalforge		src/signalforge
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.airflow.yml		docker-compose.airflow.yml
docker-compose.base.yml		docker-compose.base.yml
docker-compose.obs.yml		docker-compose.obs.yml
docker-compose.spark.yml		docker-compose.spark.yml
docker-compose.yml		docker-compose.yml
requirements.airflow.txt		requirements.airflow.txt
requirements.api.txt		requirements.api.txt
requirements.mlflow.txt		requirements.mlflow.txt
requirements.spark.txt		requirements.spark.txt

Folders and files

Latest commit

History

Repository files navigation

Hybrid SignalForge

Overview

What this project delivers

Local UIs and endpoints

Project layout

Prerequisites

Quick start

1) Configure environment

2) Start the stack

3) Initialize Airflow (one-time)

4) Bootstrap lakeFS

5) Trigger the pipeline

Verification checklist

Airflow

lakeFS

MLflow

API

Monitoring

Design notes and tradeoffs

Useful commands

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages