Write-up: Hybrid AI Data Platform for Sensor and RF Pipelines
Hybrid SignalForge is an end-to-end AI data platform for RF/sensor pipelines. It generates synthetic baseband I/Q windows with impairments (SNR, CFO, phase), versions datasets in lakeFS, runs Spark feature engineering, trains a modulation classifier in MLflow, monitors drift with Evidently, and serves predictions via FastAPI. The stack is Docker Compose–first and can later switch to AWS S3 with minimal changes.
- Versioned datasets with lakeFS branches and commits
- Reproducible orchestration in Airflow (ingest → quality → features → train → drift/merge)
- Spark-based feature engineering
- MLflow experiment tracking + model registry with Production promotion
- Drift metrics exported to Prometheus and Grafana
- A FastAPI service plus a small Python SDK for consumption
- lakeFS UI: http://localhost:8000
- Airflow UI: http://localhost:8080
- MLflow UI: http://localhost:5000
- FastAPI docs: http://localhost:9000/docs
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
HYBRID-SIGNALFORGE/
├── airflow/
│ └── dags/
│ └── signalforge_pipeline.py
├── api/
│ ├── Dockerfile
│ └── app.py
├── docker/
│ ├── airflow/
│ │ └── Dockerfile
│ ├── grafana/
│ │ ├── dashboards/
│ │ └── provisioning/
│ │ ├── dashboards/
│ │ │ └── dashboard.yml
│ │ └── datasources/
│ │ └── datasource.yml
│ ├── mlflow/
│ │ └── Dockerfile
│ ├── postgres/
│ │ └── init.sql
│ ├── prometheus/
│ │ └── prometheus.yml
│ └── spark/
│ └── Dockerfile
├── spark_jobs/
│ └── feature_engineering.py
├── src/
│ └── signalforge/
│ ├── __init__.py
│ ├── config.py
│ ├── data_gen.py
│ ├── drift.py
│ ├── feature_extract.py
│ ├── lakefs_bootstrap.py
│ ├── lakefs_ops.py
│ ├── quality.py
│ └── train.py
├── .env
├── .env.example
├── .gitignore
├── docker-compose.airflow.yml
├── docker-compose.base.yml
├── docker-compose.obs.yml
├── docker-compose.spark.yml
├── docker-compose.yml
├── README.md
├── requirements.airflow.txt
├── requirements.api.txt
├── requirements.mlflow.txt
└── requirements.spark.txt
- Docker Engine + Docker Compose v2
- Python 3.11+
- Make (optional)
cp .env.example .envdocker compose up -d --builddocker compose run --rm airflow-initmake bootstrapWithout Make:
docker compose exec airflow-webserver python -m signalforge.lakefs_bootstrapmake triggerWithout Make:
docker compose exec airflow-webserver airflow dags trigger signalforge_end_to_end- Open http://localhost:8080
- Login:
admin/admin - DAG:
signalforge_end_to_endis green
- Open http://localhost:8000
- Login with access key/secret from
.env - Repo shows branches, commits, and merge into
main
- Open http://localhost:5000
- Experiment
signalforge_modulationhas runs - Registered model
rf_modulation_classifieris in Production
Health:
curl -s http://localhost:9000/healthModel:
curl -s http://localhost:9000/modelPredict:
curl -s -X POST http://localhost:9000/predict \
-H "Content-Type: application/json" \
-d '{"i":[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,0.9,0.8,0.7,0.6,0.5,0.4],"q":[0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.1]}'- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
- Spark submission runs inside Docker for local simplicity. In production, use Spark on Kubernetes or a managed Spark service and submit via operator/API.
- Drift is computed against a baseline run ID for clarity. Production setups usually track baselines in metadata and compare rolling windows.
- Governance is lightweight (lakeFS commits + stored quality/drift reports). This keeps the system lean but audit-friendly.
Start or rebuild:
docker compose up -d --buildLogs:
docker compose logs -f airflow-scheduler airflow-webserverStop (keep data):
docker compose downFull reset:
docker compose down -v --rmi local --remove-orphans