Skip to content

Commit 208b6c7

Browse files
authored
Merge pull request #7 from Jakee4488/chore/update-workflows
docs: add comprehensive operational and deployment documentation for …
2 parents 0c20e5b + 6915bba commit 208b6c7

6 files changed

Lines changed: 303 additions & 24 deletions

File tree

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# CI/CD & Databricks Execution Graph
2+
3+
Complete end-to-end flow from code commit to running Databricks pipelines.
4+
5+
---
6+
7+
## Full Pipeline Execution Graph
8+
9+
```mermaid
10+
flowchart TD
11+
%% ── Developer Actions ──────────────────────────────────────
12+
DEV([👨‍💻 Developer]):::person
13+
14+
DEV -->|git push feature/branch| PUSH[Push to Feature Branch]
15+
PUSH --> PR[Open Pull Request\nto main]
16+
17+
%% ── Branch Protection ──────────────────────────────────────
18+
PR --> PROTECT{protect_main.yml}:::workflow
19+
20+
PROTECT --> PC1[pr-ready-check\nNot a Draft?]
21+
PROTECT --> PC2[branch-name-check\nfeature/* fix/* etc]
22+
PROTECT --> PC3[pr-description-check\n≥ 30 chars?]
23+
24+
PC1 & PC2 & PC3 -->|All pass| MERGE_GATE{👁️ Code Review\n1 Approval Required}
25+
26+
MERGE_GATE -->|Approved| MERGE[Merge PR → main]
27+
MERGE_GATE -->|Changes requested| DEV
28+
29+
%% ── CI Pipeline ────────────────────────────────────────────
30+
MERGE --> CI{ci.yml\nCI Pipeline}:::workflow
31+
32+
subgraph CI_JOBS["🔵 CI Jobs — Run in Parallel"]
33+
direction TB
34+
CI --> LINT[lint-and-test\nruff check\nruff format\npytest -m not ci_exclude]
35+
CI --> SEC[security-scan\nBandit\nSafety\nSecret scan]
36+
LINT --> BV[bundle-validate\ndatabricks bundle validate -t dev]
37+
LINT & SEC --> BUILD[build\nuv build → .whl artifact]
38+
end
39+
40+
BV & BUILD -->|All green| CI_PASS{✅ CI Passed}
41+
LINT -->|Fail| CI_FAIL[❌ CI Failed\nBlock merge]
42+
SEC -->|Fail| CI_FAIL
43+
BV -->|Fail| CI_FAIL
44+
BUILD -->|Fail| CI_FAIL
45+
46+
%% ── CD Pipeline ────────────────────────────────────────────
47+
CI_PASS --> CD{cd.yml\nCD Pipeline}:::workflow
48+
49+
%% ── DEV Stage ──────────────────────────────────────────────
50+
subgraph DEV_STAGE["🟡 Stage 1 — Deploy Dev"]
51+
direction TB
52+
CD --> D1[uv build\nBuild .whl]
53+
D1 --> D2[databricks bundle validate -t dev]
54+
D2 --> D3[databricks bundle deploy -t dev\n--var git_sha --var branch\n--var finnhub_api_key --var alphavantage_api_key]
55+
D3 --> D4[Upload .whl to\ndbfs:/Volumes/mlops_dev/.../packages/]
56+
D4 --> D5[🔥 Smoke Test\ndatabricks bundle run\nfinancial_retraining_workflow -t dev --no-wait]
57+
end
58+
59+
D5 -->|Success| ACC_GATE{main branch?}
60+
D5 -->|Fail| FAIL_DEV[❌ Dev Deploy Failed]
61+
62+
%% ── ACC Stage ──────────────────────────────────────────────
63+
ACC_GATE -->|Yes| ACC_STAGE
64+
65+
subgraph ACC_STAGE["🟠 Stage 2 — Deploy Acceptance"]
66+
direction TB
67+
A1[uv build]
68+
A2[databricks bundle deploy -t acc\n--var git_sha --var branch\n--var API keys]
69+
A3[Upload .whl to\ndbfs:/Volumes/mlops_acc/.../packages/]
70+
A4[🧪 Acceptance Test\ndatabricks bundle run\nfinancial_retraining_workflow -t acc --no-wait]
71+
A1 --> A2 --> A3 --> A4
72+
end
73+
74+
A4 -->|Success| PRD_STAGE
75+
76+
%% ── PRD Stage ──────────────────────────────────────────────
77+
subgraph PRD_STAGE["🔴 Stage 3 — Deploy Production"]
78+
direction TB
79+
P1[uv build]
80+
P2[databricks bundle deploy -t prd\n--var git_sha --var branch\n--var API keys]
81+
P3[Upload .whl to\ndbfs:/Volumes/mlops_prd/.../packages/]
82+
P1 --> P2 --> P3
83+
end
84+
85+
P3 -->|Success| PRD_OK[✅ Production Deployed]
86+
P3 -->|Fail| ROLLBACK
87+
88+
subgraph ROLLBACK["🚨 Emergency Rollback"]
89+
direction TB
90+
R1[databricks bundle deploy -t prd\n--var git_sha=previous_sha]
91+
end
92+
93+
ROLLBACK --> PRD_FAIL[⚠️ Rolled Back to Previous SHA]
94+
95+
%% ── Databricks Runtime ─────────────────────────────────────
96+
PRD_OK --> DB_RUNTIME
97+
98+
subgraph DB_RUNTIME["☁️ Databricks Workspace — Always-On Pipelines"]
99+
direction TB
100+
101+
subgraph INGEST["📥 Ingestion Workflow — Manual / Event"]
102+
direction LR
103+
I1[collect_finnhub_stream.py\nFinnhubCollector\nWebSocket → Buffer → Delta]
104+
I2[run_streaming_pipeline\nFire DLT trigger]
105+
I1 -->|landing zone files| I2
106+
end
107+
108+
subgraph DLT["⚡ Streaming DLT Pipeline — financial-streaming-dlt"]
109+
direction LR
110+
B[bronze_trades\nAuto Loader\nJSON → Delta]
111+
S[silver_trades\nDedup + Validate\n+ Enrich]
112+
G[gold_trade_features\nRSI · MACD · VWAP\nBollinger · Volume Z-score]
113+
B -->|dlt.read_stream| S -->|dlt.read| G
114+
end
115+
116+
subgraph RETRAIN["🏆 Retraining Workflow — Nightly 00:00 UTC"]
117+
direction LR
118+
T1[train_tournament.py\n4-Model Tournament\nLightGBM · XGBoost\nRF · IsolationForest]
119+
T2[deploy_anomaly_model.py\nChampion/Challenger Gate\nPROMOTE or REJECT]
120+
T3[(UC Model Registry\nanomaly_model_champion\nalias: champion)]
121+
T1 -->|challenger result| T2 -->|register + alias| T3
122+
end
123+
124+
subgraph MONITOR["📊 Drift Monitoring — Every 30 min"]
125+
direction LR
126+
M1[detect_drift.py\nLoad reference window\nLoad current window]
127+
M2[DriftDetector\nPSI on numericals\nJS on categoricals]
128+
M3{overall_drift?}
129+
M4[RetrainingTrigger\nCooldown check\nDaily limit check]
130+
M5[AlertManager\nWebhook + Delta audit]
131+
M6[🔄 Trigger Retraining Job\nWorkspaceClient.jobs.run_now]
132+
M1 --> M2 --> M3
133+
M3 -->|Yes| M4 --> M5 --> M6
134+
M3 -->|No| MEND[✅ No action]
135+
end
136+
137+
I2 -->|triggers| DLT
138+
G -->|gold_trade_features| RETRAIN
139+
G -->|gold_trade_features| MONITOR
140+
M6 -->|triggers| RETRAIN
141+
end
142+
143+
%% ── Manual Triggers ────────────────────────────────────────
144+
MANUAL([👨‍💻 Manual\nworkflow_dispatch]):::person
145+
MANUAL -->|target: dev/acc/prd| CD
146+
147+
%% ── Styles ─────────────────────────────────────────────────
148+
classDef workflow fill:#1a1a2e,stroke:#4a90d9,color:#fff,rx:6
149+
classDef person fill:#2d6a4f,stroke:#52b788,color:#fff,rx:20
150+
classDef default fill:#16213e,stroke:#4a4a7a,color:#e0e0e0
151+
```
152+
153+
---
154+
155+
## Stage-by-Stage Legend
156+
157+
| Symbol | Stage | Runs On |
158+
|---|---|---|
159+
| 🔵 | CI Pipeline | GitHub-hosted Ubuntu runner |
160+
| 🟡 | Deploy Dev | GitHub-hosted Ubuntu runner → Databricks dev |
161+
| 🟠 | Deploy Acceptance | GitHub-hosted Ubuntu runner → Databricks acc |
162+
| 🔴 | Deploy Production | GitHub-hosted Ubuntu runner → Databricks prd |
163+
| 🚨 | Emergency Rollback | Auto-triggers on prd failure |
164+
| ☁️ | Databricks Always-On | Databricks Serverless / Jobs compute |
165+
166+
---
167+
168+
## Timing Reference
169+
170+
| Event | Expected Duration |
171+
|---|---|
172+
| CI (lint + test + build) | ~3–6 minutes |
173+
| Bundle validate | ~30 seconds |
174+
| Deploy to dev | ~2–3 minutes |
175+
| Deploy to acc | ~2–3 minutes |
176+
| Deploy to prd | ~2–3 minutes |
177+
| **Full push → production** | **~10–15 minutes** |
178+
| Ingestion job (5 min run) | 5 minutes |
179+
| DLT pipeline (trigger mode) | 5–15 minutes |
180+
| Retraining nightly job | 30–90 minutes (4 models) |
181+
| Drift monitoring (every 30 min) | 2–5 minutes |
182+
183+
---
184+
185+
## Trigger Summary
186+
187+
```
188+
Developer push to feature branch
189+
190+
├──► protect_main.yml (PR checks: naming, draft, description)
191+
192+
└──► [PR approved] merge to main
193+
194+
├──► ci.yml (lint + test + bundle validate + build)
195+
│ │
196+
│ └──► cd.yml (dev → acc → prd)
197+
│ │
198+
│ └──► Databricks bundle deploy
199+
│ │
200+
│ └──► Smoke test (retrain job --no-wait)
201+
202+
└──► [Always running in Databricks]
203+
Ingestion → every manual/scheduled run
204+
DLT → triggered by ingestion
205+
Retraining → nightly 00:00 UTC
206+
Drift Monitor → every 30 minutes
207+
```

Operational_Documents/COMPREHENSIVE_MLOPS_SETUP_GUIDE.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This guide provides an end-to-end setup for:
44
- initializing ingestion and Delta Live Tables (DLT) pipelines,
55
- evaluating and selecting the best model,
66
- deploying the champion model to streaming inference,
7-
- implementing automated weekly retraining.
7+
- implementing automated nightly retraining (with drift-triggered retraining on top).
88

99
It is written for this repository's Databricks Asset Bundles (DAB) layout.
1010

@@ -21,9 +21,10 @@ Target workflow:
2121
6. **Retrain weekly** (plus drift-triggered retraining if needed).
2222

2323
Core bundle resources:
24+
- `resources/ingestion_workflow.yml` -> `financial_api_ingestion_workflow`
2425
- `resources/streaming_pipeline.yml` -> `financial_streaming_pipeline`
2526
- `resources/retraining_workflow.yml` -> `financial_retraining_workflow`
26-
- `resources/drift_monitoring.yml` -> `drift_monitoring_job`
27+
- `resources/drift_monitoring.yml` -> `drift_monitoring_job`
2728

2829
---
2930

@@ -38,6 +39,10 @@ databricks auth login
3839
databricks bundle validate -t dev
3940
```
4041

42+
> **Branch protection is active.** Never push directly to `main`. Always work on a
43+
> `feature/*`, `fix/*`, `chore/*`, or `hotfix/*` branch and open a PR.
44+
> See `protect_main.yml` and `CICD_EXECUTION_GRAPH.md` for the full flow.
45+
4146
### 2.2 Unity Catalog infrastructure
4247

4348
Run once (as principal with `CREATE CATALOG` permission):
@@ -165,8 +170,8 @@ Post-deploy validations:
165170

166171
## 7) Weekly Automated Retraining
167172

168-
Current retraining schedule in `resources/retraining_workflow.yml` is daily:
169-
- `0 0 0 * * ?`
173+
Current retraining schedule in `resources/retraining_workflow.yml` is **nightly**:
174+
- `0 0 0 * * ?` (midnight UTC, every day)
170175

171176
To run weekly (example: Monday 02:00 UTC), change to:
172177

Operational_Documents/DEPLOYMENT_VALIDATION_GUIDE.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,15 +34,23 @@ Before running the pipelines, ensure your GitHub repository has the required Env
3434
- `DATABRICKS_TOKEN` (Or OAuth alternatives: `DATABRICKS_CLIENT_ID` and `DATABRICKS_CLIENT_SECRET`)
3535
- `FINNHUB_API_KEY`
3636
- `ALPHAVANTAGE_API_KEY`
37+
4. **Branch Protection:** Confirm `protect_main.yml` is active and branch protection rules are set in:
38+
`Settings → Branches → Add ruleset → Require pull request + require status checks`.
39+
See `CICD_EXECUTION_GRAPH.md` for the full enforcement flow.
3740

3841
---
3942

4043
## Step 2: Validate Continuous Integration (CI) Pipeline
4144
**Workflow Source:** `.github/workflows/ci.yml`
4245

43-
The CI pipeline runs on Push (to `main` or `develop`) and Pull Request logic. To test it:
46+
The CI pipeline runs on push to `main` or `develop`, and on Pull Requests to `main`.
4447

45-
1. **Trigger:** Create a feature branch off of `develop` or `main`. Make a non-breaking code change (e.g., adding a comment), commit, and push. Open a Pull Request targeting `main`.
48+
> **Note:** Changes only to `Operational_Documents/**` or `README.md` are **ignored** by CI
49+
> via `paths-ignore` filters added to the workflow. Code changes always trigger CI.
50+
51+
To test it:
52+
53+
1. **Trigger:** Create a **feature branch** (e.g. `feature/my-change`) — direct pushes to `main` are blocked by `protect_main.yml`. Push the branch, then open a Pull Request targeting `main`.
4654
2. **Monitor Actions:** Navigate to the **Actions** tab in GitHub and click on the "CI Pipeline" run.
4755
3. **Verify Jobs:**
4856
- **Lint & Test:** Ensure `ruff` format checks pass and unit tests execute successfully (`pytest`). Verify `test-results.xml` and `coverage.xml` artifacts upload successfully.
@@ -72,6 +80,12 @@ The CD pipeline triggers on pushes to the `main` branch or manually via `workflo
7280
## Step 4: Validate Manual Model Validation Gate
7381
**Workflow Source:** `.github/workflows/model_validation.yml`
7482

83+
> **All active workflows:**
84+
> - `ci.yml` — Lint, test, bundle validate, build
85+
> - `cd.yml` — Deploy dev → acc → prd
86+
> - `protect_main.yml` — Branch protection, PR naming, draft check
87+
> - `model_validation.yml` — Manual model validation gate
88+
7589
1. **Trigger:** Go to Actions, select "Model Validation Gate", and run it via `workflow_dispatch`.
7690
2. Provide a dummy `model_version` (e.g., `v1.2.0`).
7791
3. Ensure the workflow checks out successfully, runs the `ModelValidator` logic verifying PR-AUC and F1 scores, and connects seamlessly to the Unity Catalog MLflow registry.
@@ -121,7 +135,11 @@ databricks bundle deploy -t dev \
121135
--var="finnhub_api_key=<key>" \
122136
--var="alphavantage_api_key=<key>"
123137

124-
databricks fs cp --overwrite dist/<wheel-name>.whl dbfs:/FileStore/financial_ai_mlops_<sha>.whl
138+
# Upload wheel to Unity Catalog Volume (not DBFS FileStore)
139+
WHEEL=$(ls -t dist/*.whl | head -n 1)
140+
databricks fs mkdirs "dbfs:/Volumes/mlops_dev/financial_transactions/packages/"
141+
databricks fs cp --overwrite "$WHEEL" "dbfs:/Volumes/mlops_dev/financial_transactions/packages/$(basename $WHEEL)"
142+
125143
databricks bundle run financial_retraining_workflow -t dev
126144
```
127145

Operational_Documents/GETTING_STARTED_PIPELINE_TESTING.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ Latest verified local result in this workspace (2026-04-08):
3838
- Runtime about 51s
3939
- Coverage about 46%
4040

41+
> **Branch protection is active.** Do not push directly to `main`. Create a `feature/*` or
42+
> `fix/*` branch, push it, and open a Pull Request. See `protect_main.yml` for enforced checks.
43+
4144
## 2. What Current Tests Cover
4245

4346
Automated tests currently validate:
@@ -80,9 +83,11 @@ Pass condition:
8083
### 3.3 Execute pipeline resources
8184

8285
```bash
83-
databricks bundle run -t dev financial_streaming_pipeline
84-
databricks bundle run -t dev financial_retraining_workflow
85-
databricks bundle run -t dev drift_monitoring_job
86+
# Run pipelines in dependency order:
87+
databricks bundle run -t dev financial_api_ingestion_workflow # Step 1: ingest data
88+
databricks bundle run -t dev financial_streaming_pipeline # Step 2: DLT Bronze→Gold
89+
databricks bundle run -t dev financial_retraining_workflow # Step 3: train + deploy model
90+
databricks bundle run -t dev drift_monitoring_job # Step 4: drift check
8691
```
8792

8893
Pass conditions:
@@ -288,13 +293,21 @@ Treat a candidate as ready only when all are true:
288293
- bundle validate and deploy are green in target env,
289294
- streaming, retraining, and drift runs are green,
290295
- rollback process has been validated,
291-
- no blocking monitoring alerts remain.
296+
- no blocking monitoring alerts remain,
297+
- PR was reviewed and merged (not pushed directly to main),
298+
- `protect_main.yml` branch-name and description checks passed.
292299

293300
## 6. Useful Paths
294301

295302
- `databricks.yml`
296303
- `project_config.yml`
304+
- `resources/ingestion_workflow.yml`
297305
- `resources/streaming_pipeline.yml`
298306
- `resources/retraining_workflow.yml`
299307
- `resources/drift_monitoring.yml`
308+
- `.github/workflows/ci.yml`
309+
- `.github/workflows/cd.yml`
310+
- `.github/workflows/protect_main.yml`
300311
- `tests/financial_transactions/`
312+
- `Operational_Documents/CICD_EXECUTION_GRAPH.md`
313+
- `Operational_Documents/FUNCTION_AND_CONFIG_REFERENCE.md`

Operational_Documents/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Centralised operational documentation for Financial AI MLOps.
2424
| Document | Purpose |
2525
|---|---|
2626
| [`RUNBOOK.md`](RUNBOOK.md) | On-call incident response procedures |
27+
| [`CICD_EXECUTION_GRAPH.md`](CICD_EXECUTION_GRAPH.md) | Full CI/CD + Databricks execution flow diagram with timing |
2728

2829
### Security
2930

0 commit comments

Comments
 (0)