Skip to content

Commit 54849d6

Browse files
SecAI-Hubclaude
andcommitted
Implement M46: Operational maturity
Bootstrap trust gap: install docs now verify image signature (cosign) before the unverified rebase, with documented rationale for the two-step bootstrap flow. CI improvements: removed blanket paths-ignore for .md files (all changes now trigger CI), added ruff lint and bandit security scan to Python job, split Python tests into unit/integration and adversarial/acceptance suites, added docs-validation job (broken links, required docs, test counts format). Code quality: fixed 63 unused imports across services and tests (ruff auto-fix), fixed ambiguous variable names, fixed unused local variables. New documentation: - Production-readiness checklist (formal release gate) - SLOs (availability, latency, correctness targets, alerting thresholds) - Release channel policy (stable/candidate/dev, versioning, upgrade paths) - Support lifecycle (hardware matrix, driver versions, deprecation policy) - Sample verification output for verify-release.sh README: expanded CI evidence table to all 10 jobs with workflow links, updated milestone count to 46. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9594ed2 commit 54849d6

31 files changed

Lines changed: 869 additions & 98 deletions

.github/workflows/ci.yml

Lines changed: 104 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,7 @@ name: CI
33
on:
44
push:
55
branches: [main]
6-
paths-ignore:
7-
- "**.md"
86
pull_request:
9-
paths-ignore:
10-
- "**.md"
117
workflow_dispatch:
128

139
concurrency:
@@ -58,7 +54,7 @@ jobs:
5854
python-version: "3.12"
5955

6056
- name: Install dependencies
61-
run: pip install pyyaml flask requests pytest
57+
run: pip install pyyaml flask requests pytest ruff bandit
6258

6359
- name: Lint (syntax check)
6460
run: |
@@ -76,10 +72,25 @@ jobs:
7672
python -m py_compile services/agent/agent/capabilities.py
7773
python -m py_compile services/agent/agent/sandbox.py
7874
79-
- name: Test
75+
- name: Ruff lint
76+
run: ruff check services/ tests/ --select E,F,W --ignore E501,E402
77+
78+
- name: Bandit security scan
79+
run: |
80+
bandit -r services/ -ll --skip B101,B404,B603 -f txt || {
81+
echo "::warning::Bandit found potential security issues (see above)"
82+
true
83+
}
84+
85+
- name: Test (unit + integration)
8086
env:
8187
PYTHONPATH: services
82-
run: python -m pytest tests/ -v
88+
run: python -m pytest tests/ -v --ignore=tests/test_adversarial.py --ignore=tests/test_m5_acceptance.py -x
89+
90+
- name: Test (adversarial + acceptance)
91+
env:
92+
PYTHONPATH: services
93+
run: python -m pytest tests/test_adversarial.py tests/test_m5_acceptance.py -v --tb=short
8394

8495
shellcheck:
8596
name: Shell Script Lint
@@ -300,3 +311,89 @@ jobs:
300311
pip install pip-audit pyyaml flask requests
301312
echo "=== Python Dependency Audit ==="
302313
pip-audit --strict --desc || echo "WARNING: Python dependencies have known vulnerabilities"
314+
315+
docs-validation:
316+
name: Documentation Validation
317+
runs-on: ubuntu-latest
318+
permissions:
319+
contents: read
320+
steps:
321+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
322+
323+
- name: Check for broken internal links
324+
run: |
325+
echo "=== Checking internal doc links ==="
326+
ERRORS=0
327+
# Find all markdown links to local files
328+
for md in $(find docs/ README.md CONTRIBUTING.md SECURITY.md -name '*.md' 2>/dev/null); do
329+
# Extract relative links (not URLs, not anchors)
330+
grep -oP '\[([^\]]*)\]\((?!https?://|#)([^)]+)\)' "$md" 2>/dev/null | \
331+
grep -oP '\(([^)]+)\)' | tr -d '()' | while read -r link; do
332+
# Strip anchor fragments
333+
target="${link%%#*}"
334+
[ -z "$target" ] && continue
335+
# Resolve relative to file's directory
336+
dir=$(dirname "$md")
337+
resolved="${dir}/${target}"
338+
if [ ! -f "$resolved" ] && [ ! -d "$resolved" ]; then
339+
echo "BROKEN: ${md} -> ${link} (resolved: ${resolved})"
340+
ERRORS=$((ERRORS + 1))
341+
fi
342+
done
343+
done
344+
if [ "$ERRORS" -gt 0 ]; then
345+
echo "FAIL: ${ERRORS} broken internal links found"
346+
exit 1
347+
fi
348+
echo "OK: All internal doc links valid"
349+
350+
- name: Verify required docs exist
351+
run: |
352+
echo "=== Checking required documentation ==="
353+
REQUIRED_DOCS=(
354+
"docs/threat-model.md"
355+
"docs/architecture.md"
356+
"docs/api.md"
357+
"docs/security-status.md"
358+
"docs/production-operations.md"
359+
"docs/production-readiness-checklist.md"
360+
"docs/slos.md"
361+
"docs/release-policy.md"
362+
"docs/support-lifecycle.md"
363+
"docs/test-counts.json"
364+
"docs/install/bare-metal.md"
365+
"SECURITY.md"
366+
"CONTRIBUTING.md"
367+
"LICENSE"
368+
)
369+
ERRORS=0
370+
for doc in "${REQUIRED_DOCS[@]}"; do
371+
if [ -f "$doc" ]; then
372+
echo "OK: $doc"
373+
else
374+
echo "MISSING: $doc"
375+
ERRORS=$((ERRORS + 1))
376+
fi
377+
done
378+
if [ "$ERRORS" -gt 0 ]; then
379+
echo "FAIL: ${ERRORS} required document(s) missing"
380+
exit 1
381+
fi
382+
echo "All required documents present"
383+
384+
- name: Validate test-counts.json format
385+
run: |
386+
python3 -c "
387+
import json, sys
388+
with open('docs/test-counts.json') as f:
389+
data = json.load(f)
390+
required = ['generated', 'go', 'go_total', 'python_total', 'grand_total']
391+
for key in required:
392+
if key not in data:
393+
print(f'FAIL: test-counts.json missing key: {key}')
394+
sys.exit(1)
395+
if not isinstance(data['go'], dict):
396+
print('FAIL: go field must be a dict of service -> count')
397+
sys.exit(1)
398+
print(f'OK: test-counts.json valid (total: {data[\"grand_total\"]} tests)')
399+
"

README.md

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -49,18 +49,26 @@ Built on [uBlue](https://universal-blue.org/) (Fedora Atomic / Silverblue). All
4949
### Install (Fedora Atomic)
5050

5151
```bash
52-
# Rebase to unsigned image first
52+
# 1. Verify image signature before installing (requires cosign)
53+
cosign verify --key cosign.pub ghcr.io/sec_ai/secai_os:latest
54+
55+
# 2. Bootstrap rebase (one-time unverified pull, see install docs for rationale)
5356
sudo rpm-ostree rebase ostree-unverified-registry:ghcr.io/sec_ai/secai_os:latest
5457
sudo systemctl reboot
5558

56-
# Then rebase to signed image
59+
# 3. Switch to signed transport (all future updates verified automatically)
5760
sudo rpm-ostree rebase ostree-image-signed:docker://ghcr.io/sec_ai/secai_os:latest
5861
sudo systemctl reboot
5962

60-
# Set up encrypted vault
63+
# 4. Set up encrypted vault
6164
sudo /usr/libexec/secure-ai/setup-vault.sh /dev/sdX
6265
```
6366

67+
> **Why the two-step rebase?** The local ostree store doesn't have the signing policy
68+
> until the first boot. Step 1 provides out-of-band signature verification via cosign
69+
> before the unverified pull. Step 3 enables automatic verification for all future updates.
70+
> See [docs/install/bare-metal.md](docs/install/bare-metal.md) for full details.
71+
6472
See [docs/install/](docs/install/) for detailed guides: [bare metal](docs/install/bare-metal.md) | [virtual machine](docs/install/vm.md) | [development](docs/install/dev.md)
6573

6674
### Get Your First Model
@@ -151,7 +159,7 @@ Every model passes through the same fully automatic pipeline:
151159
| **Updates** | Cosign-verified rpm-ostree, staged workflow, greenboot auto-rollback |
152160
| **Supply Chain** | Per-service CycloneDX SBOMs, SLSA3 provenance attestation, cosign-signed checksums |
153161

154-
See [docs/threat-model.md](docs/threat-model.md) for threat classes, residual risks, and security invariants. See [docs/security-status.md](docs/security-status.md) for implementation status of all 45 milestones.
162+
See [docs/threat-model.md](docs/threat-model.md) for threat classes, residual risks, and security invariants. See [docs/security-status.md](docs/security-status.md) for implementation status of all 46 milestones.
155163

156164
### Verify Image Signatures
157165

@@ -209,15 +217,20 @@ See [docs/policy-schema.md](docs/policy-schema.md) for full schema reference. Se
209217

210218
### CI Verification Evidence
211219

212-
Each CI job produces specific security evidence:
213-
214-
| Job | What It Proves |
215-
|-----|---------------|
216-
| `security-regression` | Adversarial test suite: prompt injection, policy bypass, containment |
217-
| `supply-chain-verify` | SBOM generation via Syft, cosign availability, provenance keywords |
218-
| `go-build-and-test` | 399 Go tests across 9 services with `-race` |
219-
| `python-test` | 718 Python tests (agent, adversarial, M5 acceptance, UI, pipeline) |
220-
| `test-count-check` | Prevents documented test counts from drifting below actual |
220+
All CI jobs are defined in [`.github/workflows/ci.yml`](.github/workflows/ci.yml). View the [latest CI run](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml).
221+
222+
| Job | Workflow Link | What It Proves |
223+
|-----|--------------|---------------|
224+
| `go-build-and-test` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | 399 Go tests across 9 services with `-race` (build, test, vet) |
225+
| `python-test` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | 718 Python tests split into unit/integration + adversarial/acceptance, ruff lint, bandit security scan |
226+
| `security-regression` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Adversarial test suite: prompt injection, policy bypass, containment, recovery |
227+
| `supply-chain-verify` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | SBOM generation via Syft, cosign availability, provenance keywords in release/build workflows |
228+
| `test-count-check` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Prevents documented test counts from drifting below actual (source of truth: [test-counts.json](docs/test-counts.json)) |
229+
| `dependency-audit` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Go vulnerability scanning (govulncheck) + Python dependency audit (pip-audit) |
230+
| `shellcheck` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Static analysis of all shell scripts (first-boot, build, verify-release, etc.) |
231+
| `policy-validate` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | YAML schema validation for all policy and recipe files |
232+
| `check-pins` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Verifies all GitHub Actions are pinned to specific commit SHAs (not tags) |
233+
| `docs-validation` | [View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml) | Broken link detection, required docs presence, test-counts.json format validation |
221234

222235
---
223236

@@ -229,7 +242,7 @@ Each CI job produces specific security evidence:
229242
| [Threat Model](docs/threat-model.md) | Threat classes, invariants, residual risks |
230243
| [API Reference](docs/api.md) | HTTP API for all services |
231244
| [Policy Schema](docs/policy-schema.md) | Full policy.yaml schema reference |
232-
| [Security Status](docs/security-status.md) | Implementation status of all 45 milestones |
245+
| [Security Status](docs/security-status.md) | Implementation status of all 46 milestones |
233246
| [Test Matrix](docs/test-matrix.md) | Test coverage: 1,117 tests across Go and Python (see [test-counts.json](docs/test-counts.json)) |
234247
| [Compatibility Matrix](docs/compatibility-matrix.md) | GPU, VM, and hardware support |
235248
| [Security Test Matrix](docs/security-test-matrix.md) | Security feature test coverage |
@@ -259,6 +272,10 @@ Each CI job produces specific security evidence:
259272
| [Recovery Runbook](docs/recovery-runbook.md) | Operator procedures for degradation, containment, and recovery |
260273
| [Sample Release Bundle](docs/sample-release-bundle.md) | Release artifact structure and verification commands |
261274
| [Production Operations](docs/production-operations.md) | First-boot checks, upgrades, key rotation, monitoring, capacity |
275+
| [Production Readiness Checklist](docs/production-readiness-checklist.md) | Formal release gate checklist for production deployments |
276+
| [SLOs](docs/slos.md) | Service level objectives: availability, latency, correctness targets |
277+
| [Release Policy](docs/release-policy.md) | Release channels (stable/candidate/dev), versioning, upgrade paths |
278+
| [Support Lifecycle](docs/support-lifecycle.md) | Hardware matrix, driver versions, support windows, deprecation policy |
262279

263280
### Install Guides
264281

@@ -362,7 +379,7 @@ See [docs/test-matrix.md](docs/test-matrix.md) for full breakdown.
362379
## Roadmap
363380

364381
<details>
365-
<summary>All 44 project milestones (click to expand)</summary>
382+
<summary>All 46 project milestones (click to expand)</summary>
366383

367384
- [x] **Milestone 0** -- Threat model, dataflow, invariants, policy files
368385
- [x] **Milestone 1** -- Bootable OS, encrypted vault, GPU drivers
@@ -410,6 +427,7 @@ See [docs/test-matrix.md](docs/test-matrix.md) for full breakdown.
410427
- [x] **Milestone 43** -- Stronger isolation: sandbox tightening, adversarial tests, CI security regression, MCP isolation, recovery ceremonies, M5 acceptance suite
411428
- [x] **Milestone 44** -- Auditability and documentation hardening: test-count drift CI check, CI evidence links and badges, M4/M5 terminology disambiguation, audit quick-path doc, recovery runbook, verify-release script, security/product roadmap split
412429
- [x] **Milestone 45** -- Production readiness hardening: incident persistence (file-backed), graceful shutdown for all Go services, HTTP timeouts, systemd production hardening, first-boot validation, audit log rotation, CI vulnerability scanning, production operations guide
430+
- [x] **Milestone 46** -- Operational maturity: bootstrap trust gap fix (cosign verify before rebase), CI runs on all changes (removed paths-ignore for .md), Python quality gates (ruff + bandit + split test suites), docs-validation CI job, production-readiness checklist, SLOs, release channel policy, support lifecycle, sample verification output
413431

414432
</details>
415433

docs/install/bare-metal.md

Lines changed: 51 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -65,19 +65,53 @@ Replace `/dev/sdX` or `/dev/rdiskN` with your actual USB device. Double-check th
6565

6666
## Step 4: Rebase to SecAI OS
6767

68-
After booting into the fresh Fedora Silverblue installation, open a terminal and rebase to the SecAI OS image:
68+
After booting into the fresh Fedora Silverblue installation, open a terminal.
69+
70+
### 4a. Verify image signature (before rebasing)
71+
72+
Before installing the image, verify its authenticity using cosign:
6973

7074
```bash
71-
rpm-ostree rebase ostree-unverified-registry:ghcr.io/sec_ai/secai_os:latest
75+
# Install cosign (if not already present)
76+
sudo dnf install -y cosign
77+
78+
# Fetch the project's public key
79+
curl -sSfL https://raw.githubusercontent.com/SecAI-Hub/SecAI_OS/main/cosign.pub -o /tmp/cosign.pub
80+
81+
# Verify the image signature
82+
cosign verify --key /tmp/cosign.pub ghcr.io/sec_ai/secai_os:latest
7283
```
7384

74-
Wait for the rebase to complete, then reboot:
85+
You should see `The following checks were performed on each of these signatures: ...`
86+
with a successful verification result. **Do not proceed if verification fails.**
87+
88+
### 4b. Bootstrap rebase
89+
90+
> **Note on the bootstrap trust gap:** The first rebase must use
91+
> `ostree-unverified-registry:` because the local ostree store does not yet
92+
> have the SecAI signing policy configured. This is a one-time bootstrapping
93+
> step — the cosign verification above provides out-of-band attestation
94+
> before the unverified pull. After the first boot, all subsequent updates
95+
> use `ostree-image-signed:` and are verified automatically.
7596
7697
```bash
77-
systemctl reboot
98+
# Initial rebase (signature verified out-of-band above)
99+
sudo rpm-ostree rebase ostree-unverified-registry:ghcr.io/sec_ai/secai_os:latest
100+
sudo systemctl reboot
78101
```
79102

80-
After reboot, the system will be running SecAI OS.
103+
### 4c. Switch to signed updates
104+
105+
After the first reboot, switch to the signed image transport so that all
106+
future updates are cryptographically verified by rpm-ostree:
107+
108+
```bash
109+
# Switch to the signed transport (all future updates verified automatically)
110+
sudo rpm-ostree rebase ostree-image-signed:docker://ghcr.io/sec_ai/secai_os:latest
111+
sudo systemctl reboot
112+
```
113+
114+
After this reboot, the system is running SecAI OS with full signature verification enabled.
81115

82116
---
83117

@@ -97,7 +131,18 @@ You will be prompted to set a vault passphrase. This passphrase encrypts the LUK
97131

98132
## Step 6: First Boot Verification
99133

100-
After firstboot completes, verify the installation:
134+
After firstboot completes, run the automated health check:
135+
136+
```bash
137+
# Comprehensive health check (validates all services, endpoints, security posture)
138+
sudo /usr/libexec/secure-ai/first-boot-check.sh
139+
```
140+
141+
This validates all core services are running, health endpoints respond, attestation
142+
state is verified, no open incidents exist, and no services are exposed on public
143+
interfaces. See [docs/production-operations.md](../production-operations.md) for details.
144+
145+
You can also verify manually:
101146

102147
```bash
103148
# Check that all services are running

0 commit comments

Comments
 (0)