|
| 1 | +# CI/CD & Databricks Execution Graph |
| 2 | + |
| 3 | +Complete end-to-end flow from code commit to running Databricks pipelines. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Full Pipeline Execution Graph |
| 8 | + |
| 9 | +```mermaid |
| 10 | +flowchart TD |
| 11 | + %% ── Developer Actions ────────────────────────────────────── |
| 12 | + DEV([👨💻 Developer]):::person |
| 13 | +
|
| 14 | + DEV -->|git push feature/branch| PUSH[Push to Feature Branch] |
| 15 | + PUSH --> PR[Open Pull Request\nto main] |
| 16 | +
|
| 17 | + %% ── Branch Protection ────────────────────────────────────── |
| 18 | + PR --> PROTECT{protect_main.yml}:::workflow |
| 19 | +
|
| 20 | + PROTECT --> PC1[pr-ready-check\nNot a Draft?] |
| 21 | + PROTECT --> PC2[branch-name-check\nfeature/* fix/* etc] |
| 22 | + PROTECT --> PC3[pr-description-check\n≥ 30 chars?] |
| 23 | +
|
| 24 | + PC1 & PC2 & PC3 -->|All pass| MERGE_GATE{👁️ Code Review\n1 Approval Required} |
| 25 | +
|
| 26 | + MERGE_GATE -->|Approved| MERGE[Merge PR → main] |
| 27 | + MERGE_GATE -->|Changes requested| DEV |
| 28 | +
|
| 29 | + %% ── CI Pipeline ──────────────────────────────────────────── |
| 30 | + MERGE --> CI{ci.yml\nCI Pipeline}:::workflow |
| 31 | +
|
| 32 | + subgraph CI_JOBS["🔵 CI Jobs — Run in Parallel"] |
| 33 | + direction TB |
| 34 | + CI --> LINT[lint-and-test\nruff check\nruff format\npytest -m not ci_exclude] |
| 35 | + CI --> SEC[security-scan\nBandit\nSafety\nSecret scan] |
| 36 | + LINT --> BV[bundle-validate\ndatabricks bundle validate -t dev] |
| 37 | + LINT & SEC --> BUILD[build\nuv build → .whl artifact] |
| 38 | + end |
| 39 | +
|
| 40 | + BV & BUILD -->|All green| CI_PASS{✅ CI Passed} |
| 41 | + LINT -->|Fail| CI_FAIL[❌ CI Failed\nBlock merge] |
| 42 | + SEC -->|Fail| CI_FAIL |
| 43 | + BV -->|Fail| CI_FAIL |
| 44 | + BUILD -->|Fail| CI_FAIL |
| 45 | +
|
| 46 | + %% ── CD Pipeline ──────────────────────────────────────────── |
| 47 | + CI_PASS --> CD{cd.yml\nCD Pipeline}:::workflow |
| 48 | +
|
| 49 | + %% ── DEV Stage ────────────────────────────────────────────── |
| 50 | + subgraph DEV_STAGE["🟡 Stage 1 — Deploy Dev"] |
| 51 | + direction TB |
| 52 | + CD --> D1[uv build\nBuild .whl] |
| 53 | + D1 --> D2[databricks bundle validate -t dev] |
| 54 | + D2 --> D3[databricks bundle deploy -t dev\n--var git_sha --var branch\n--var finnhub_api_key --var alphavantage_api_key] |
| 55 | + D3 --> D4[Upload .whl to\ndbfs:/Volumes/mlops_dev/.../packages/] |
| 56 | + D4 --> D5[🔥 Smoke Test\ndatabricks bundle run\nfinancial_retraining_workflow -t dev --no-wait] |
| 57 | + end |
| 58 | +
|
| 59 | + D5 -->|Success| ACC_GATE{main branch?} |
| 60 | + D5 -->|Fail| FAIL_DEV[❌ Dev Deploy Failed] |
| 61 | +
|
| 62 | + %% ── ACC Stage ────────────────────────────────────────────── |
| 63 | + ACC_GATE -->|Yes| ACC_STAGE |
| 64 | +
|
| 65 | + subgraph ACC_STAGE["🟠 Stage 2 — Deploy Acceptance"] |
| 66 | + direction TB |
| 67 | + A1[uv build] |
| 68 | + A2[databricks bundle deploy -t acc\n--var git_sha --var branch\n--var API keys] |
| 69 | + A3[Upload .whl to\ndbfs:/Volumes/mlops_acc/.../packages/] |
| 70 | + A4[🧪 Acceptance Test\ndatabricks bundle run\nfinancial_retraining_workflow -t acc --no-wait] |
| 71 | + A1 --> A2 --> A3 --> A4 |
| 72 | + end |
| 73 | +
|
| 74 | + A4 -->|Success| PRD_STAGE |
| 75 | +
|
| 76 | + %% ── PRD Stage ────────────────────────────────────────────── |
| 77 | + subgraph PRD_STAGE["🔴 Stage 3 — Deploy Production"] |
| 78 | + direction TB |
| 79 | + P1[uv build] |
| 80 | + P2[databricks bundle deploy -t prd\n--var git_sha --var branch\n--var API keys] |
| 81 | + P3[Upload .whl to\ndbfs:/Volumes/mlops_prd/.../packages/] |
| 82 | + P1 --> P2 --> P3 |
| 83 | + end |
| 84 | +
|
| 85 | + P3 -->|Success| PRD_OK[✅ Production Deployed] |
| 86 | + P3 -->|Fail| ROLLBACK |
| 87 | +
|
| 88 | + subgraph ROLLBACK["🚨 Emergency Rollback"] |
| 89 | + direction TB |
| 90 | + R1[databricks bundle deploy -t prd\n--var git_sha=previous_sha] |
| 91 | + end |
| 92 | +
|
| 93 | + ROLLBACK --> PRD_FAIL[⚠️ Rolled Back to Previous SHA] |
| 94 | +
|
| 95 | + %% ── Databricks Runtime ───────────────────────────────────── |
| 96 | + PRD_OK --> DB_RUNTIME |
| 97 | +
|
| 98 | + subgraph DB_RUNTIME["☁️ Databricks Workspace — Always-On Pipelines"] |
| 99 | + direction TB |
| 100 | +
|
| 101 | + subgraph INGEST["📥 Ingestion Workflow — Manual / Event"] |
| 102 | + direction LR |
| 103 | + I1[collect_finnhub_stream.py\nFinnhubCollector\nWebSocket → Buffer → Delta] |
| 104 | + I2[run_streaming_pipeline\nFire DLT trigger] |
| 105 | + I1 -->|landing zone files| I2 |
| 106 | + end |
| 107 | +
|
| 108 | + subgraph DLT["⚡ Streaming DLT Pipeline — financial-streaming-dlt"] |
| 109 | + direction LR |
| 110 | + B[bronze_trades\nAuto Loader\nJSON → Delta] |
| 111 | + S[silver_trades\nDedup + Validate\n+ Enrich] |
| 112 | + G[gold_trade_features\nRSI · MACD · VWAP\nBollinger · Volume Z-score] |
| 113 | + B -->|dlt.read_stream| S -->|dlt.read| G |
| 114 | + end |
| 115 | +
|
| 116 | + subgraph RETRAIN["🏆 Retraining Workflow — Nightly 00:00 UTC"] |
| 117 | + direction LR |
| 118 | + T1[train_tournament.py\n4-Model Tournament\nLightGBM · XGBoost\nRF · IsolationForest] |
| 119 | + T2[deploy_anomaly_model.py\nChampion/Challenger Gate\nPROMOTE or REJECT] |
| 120 | + T3[(UC Model Registry\nanomaly_model_champion\nalias: champion)] |
| 121 | + T1 -->|challenger result| T2 -->|register + alias| T3 |
| 122 | + end |
| 123 | +
|
| 124 | + subgraph MONITOR["📊 Drift Monitoring — Every 30 min"] |
| 125 | + direction LR |
| 126 | + M1[detect_drift.py\nLoad reference window\nLoad current window] |
| 127 | + M2[DriftDetector\nPSI on numericals\nJS on categoricals] |
| 128 | + M3{overall_drift?} |
| 129 | + M4[RetrainingTrigger\nCooldown check\nDaily limit check] |
| 130 | + M5[AlertManager\nWebhook + Delta audit] |
| 131 | + M6[🔄 Trigger Retraining Job\nWorkspaceClient.jobs.run_now] |
| 132 | + M1 --> M2 --> M3 |
| 133 | + M3 -->|Yes| M4 --> M5 --> M6 |
| 134 | + M3 -->|No| MEND[✅ No action] |
| 135 | + end |
| 136 | +
|
| 137 | + I2 -->|triggers| DLT |
| 138 | + G -->|gold_trade_features| RETRAIN |
| 139 | + G -->|gold_trade_features| MONITOR |
| 140 | + M6 -->|triggers| RETRAIN |
| 141 | + end |
| 142 | +
|
| 143 | + %% ── Manual Triggers ──────────────────────────────────────── |
| 144 | + MANUAL([👨💻 Manual\nworkflow_dispatch]):::person |
| 145 | + MANUAL -->|target: dev/acc/prd| CD |
| 146 | +
|
| 147 | + %% ── Styles ───────────────────────────────────────────────── |
| 148 | + classDef workflow fill:#1a1a2e,stroke:#4a90d9,color:#fff,rx:6 |
| 149 | + classDef person fill:#2d6a4f,stroke:#52b788,color:#fff,rx:20 |
| 150 | + classDef default fill:#16213e,stroke:#4a4a7a,color:#e0e0e0 |
| 151 | +``` |
| 152 | + |
| 153 | +--- |
| 154 | + |
| 155 | +## Stage-by-Stage Legend |
| 156 | + |
| 157 | +| Symbol | Stage | Runs On | |
| 158 | +|---|---|---| |
| 159 | +| 🔵 | CI Pipeline | GitHub-hosted Ubuntu runner | |
| 160 | +| 🟡 | Deploy Dev | GitHub-hosted Ubuntu runner → Databricks dev | |
| 161 | +| 🟠 | Deploy Acceptance | GitHub-hosted Ubuntu runner → Databricks acc | |
| 162 | +| 🔴 | Deploy Production | GitHub-hosted Ubuntu runner → Databricks prd | |
| 163 | +| 🚨 | Emergency Rollback | Auto-triggers on prd failure | |
| 164 | +| ☁️ | Databricks Always-On | Databricks Serverless / Jobs compute | |
| 165 | + |
| 166 | +--- |
| 167 | + |
| 168 | +## Timing Reference |
| 169 | + |
| 170 | +| Event | Expected Duration | |
| 171 | +|---|---| |
| 172 | +| CI (lint + test + build) | ~3–6 minutes | |
| 173 | +| Bundle validate | ~30 seconds | |
| 174 | +| Deploy to dev | ~2–3 minutes | |
| 175 | +| Deploy to acc | ~2–3 minutes | |
| 176 | +| Deploy to prd | ~2–3 minutes | |
| 177 | +| **Full push → production** | **~10–15 minutes** | |
| 178 | +| Ingestion job (5 min run) | 5 minutes | |
| 179 | +| DLT pipeline (trigger mode) | 5–15 minutes | |
| 180 | +| Retraining nightly job | 30–90 minutes (4 models) | |
| 181 | +| Drift monitoring (every 30 min) | 2–5 minutes | |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## Trigger Summary |
| 186 | + |
| 187 | +``` |
| 188 | +Developer push to feature branch |
| 189 | + │ |
| 190 | + ├──► protect_main.yml (PR checks: naming, draft, description) |
| 191 | + │ |
| 192 | + └──► [PR approved] merge to main |
| 193 | + │ |
| 194 | + ├──► ci.yml (lint + test + bundle validate + build) |
| 195 | + │ │ |
| 196 | + │ └──► cd.yml (dev → acc → prd) |
| 197 | + │ │ |
| 198 | + │ └──► Databricks bundle deploy |
| 199 | + │ │ |
| 200 | + │ └──► Smoke test (retrain job --no-wait) |
| 201 | + │ |
| 202 | + └──► [Always running in Databricks] |
| 203 | + Ingestion → every manual/scheduled run |
| 204 | + DLT → triggered by ingestion |
| 205 | + Retraining → nightly 00:00 UTC |
| 206 | + Drift Monitor → every 30 minutes |
| 207 | +``` |
0 commit comments