|
| 1 | +# Financial AI MLOps |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +## Introduction |
| 8 | + |
| 9 | +**Financial AI MLOps** is an enterprise-grade machine learning operations project focused on detecting anomalies in financial market transactions. Built on Databricks, it provides a comprehensive end-to-end framework starting from real-time streaming data ingestion to automated model retraining and serving. |
| 10 | + |
| 11 | +The primary objective is to reliably capture financial data streams (e.g., via Finnhub WebSockets), process them using Delta Live Tables (DLT), generate predictive features, and serve robust anomaly detection models capable of identifying irregular market behaviors. |
| 12 | + |
| 13 | +## Key Functionalities |
| 14 | + |
| 15 | +This repository encompasses a complete MLOps lifecycle, broken down into the following core functionalities: |
| 16 | + |
| 17 | +### 1. Data Ingestion (Streaming & Historical) |
| 18 | +* **Real-time Streaming:** Consumes continuous websocket streams from the Finnhub API for real-time trade data. |
| 19 | +* **Historical Batching:** Periodically pulls historical market data using Alpha Vantage for robust training sets and baselining. |
| 20 | + |
| 21 | +### 2. Data Processing & Feature Engineering (DLT) |
| 22 | +* **Delta Live Tables (DLT):** A declarative pipeline architecture transforming raw data into high-value assets. |
| 23 | + * *Bronze:* Raw ingestion layer. |
| 24 | + * *Silver:* Cleaned, formatted, and validated transactional data. |
| 25 | + * *Gold:* Aggregated features ready for ML training and inference. |
| 26 | +* **Feature Store Integration:** Centralized tracking and lookup of engineered features to maintain consistency between offline training and online serving. |
| 27 | + |
| 28 | +### 3. Model Training & Tournament |
| 29 | +* **Multi-Model Support:** Implementations for XGBoost, LightGBM, Random Forest, and Isolation Forest. |
| 30 | +* **Model Tournament:** An automated training system that concurrently trains multiple algorithms on the latest Gold data, tuning hyperparameters and evaluating relative performance across custom metrics (e.g., PR AUC, F1 Score). |
| 31 | + |
| 32 | +### 4. Advanced Monitoring & Observability |
| 33 | +* **Data & Concept Drift Detection:** Continuously calculates Population Stability Index (PSI) and Jensen-Shannon Divergence on incoming live data against reference windows. |
| 34 | +* **Automated Retraining:** Programmatic triggers that initiate a retraining pipeline (Model Tournament) if performance degrades or substantial drift is detected. |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## Deployment Strategies |
| 39 | + |
| 40 | +To ensure zero-downtime, safe, and highly performant model deployments, this project employs advanced deployment and release strategies: |
| 41 | + |
| 42 | +### Champion / Challenger Gating |
| 43 | +Before any newly trained model is deployed to production, it must survive a "Champion vs. Challenger" validation phase. The system automatically benchmarks the Challenger against the currently deployed Champion across primary (PR AUC) and secondary (F1, Precision, Recall) metrics. A new model is only promoted if it proves definitively better based on configured thresholds. |
| 44 | + |
| 45 | +### A/B Testing & Traffic Splitting |
| 46 | +Production deployments support A/B testing directly within Databricks Model Serving. Traffic can be weighted and split between the stable model and a newly promoted model, allowing the team to measure real-world performance differences without impacting all end-users. |
| 47 | + |
| 48 | +### Automated Rollbacks |
| 49 | +Model serving includes a dedicated `rollback_manager`. If continuous monitoring detects severe performance degradation or latency spikes in the newest deployment, the system can automatically orchestrate a rollback to the previous known-good model state, minimizing business risk. |
| 50 | + |
| 51 | +### CI/CD with Databricks Asset Bundles |
| 52 | +Infrastructure as Code (IaC) and pipeline automation are handled using **Databricks Asset Bundles (DABs)** (`databricks.yml`). Changes merged to the main branch trigger GitHub Actions that automatically validate code, run tests, and deploy the updated resources (Jobs, DLT Pipelines, Workflows) to Databricks environments (Dev, Acc, Prd). |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +## Getting Started |
| 57 | + |
| 58 | +For a comprehensive guide on setting up the infrastructure, running pipelines, and managing operations, please refer to our internal [Operational Documents](./Operational_Documents/): |
| 59 | + |
| 60 | +* 🧭 [Project Structure Map](./Operational_Documents/PROJECT_STRUCTURE.md) |
| 61 | +* 🚀 [Comprehensive MLOps Setup Guide](./Operational_Documents/COMPREHENSIVE_MLOPS_SETUP_GUIDE.md) |
| 62 | +* ✅ [Pipeline Testing & Validation](./Operational_Documents/GETTING_STARTED_PIPELINE_TESTING.md) |
| 63 | +* 📖 [Operations Runbook](./Operational_Documents/RUNBOOK.md) |
| 64 | + |
| 65 | +*Note: Canonical configurations are found in `project_config.yml` and `pyproject.toml`.* |
0 commit comments