Skip to content

Commit b7f25d4

Browse files
SecAI-Hubclaude
andcommitted
Implement security priorities 5-8: incident recorder, GPU integrity, agent hardening, HSM keys
M38: Incident recorder + containment automation (Go service :8515, 9 incident classes, 4-state lifecycle, auto-containment per policy, 47 tests) M39: GPU integrity deep integration (driver fingerprint + device allowlist probes, /v1/attest-state for runtime-attestor, incident-recorder auto-reporting, 81 tests) M40: Agent verified supervisor hardening (HMAC-SHA256 signed capability tokens bound to task/intent/policy, nonce replay protection, token expiry, two-phase approval for high-risk actions, per-step PolicyDecision evidence in audit trail) M41: HSM-backed key handling (keystore abstraction with software/TPM2/PKCS#11 backends, key rotation, auto-detection, keystore.yaml config) Also includes: landlock entries, systemd units, seccomp profiles, CI/release matrix updates, build-services.sh entries, recipe.yml updates, component docs, and security status/test matrix documentation. Tests: 159 agent (up from 93), 309+ Go across 9 services, ~970 total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1fe9aa7 commit b7f25d4

75 files changed

Lines changed: 16142 additions & 35 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
contents: read
2727
strategy:
2828
matrix:
29-
service: [airlock]
29+
service: [airlock, registry, tool-firewall, gpu-integrity-watch, mcp-firewall, policy-engine, runtime-attestor, integrity-monitor, incident-recorder]
3030
steps:
3131
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
3232
- uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0

.github/workflows/release.yml

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
name: Release
2+
3+
on:
4+
push:
5+
tags: ["v*"]
6+
workflow_dispatch:
7+
inputs:
8+
dry_run:
9+
description: "Dry run (skip publish)"
10+
type: boolean
11+
default: false
12+
13+
concurrency:
14+
group: release-${{ github.ref }}
15+
cancel-in-progress: false
16+
17+
permissions:
18+
contents: write
19+
packages: write
20+
id-token: write
21+
attestations: write
22+
23+
jobs:
24+
build-go:
25+
name: Build Go Services
26+
runs-on: ubuntu-latest
27+
strategy:
28+
fail-fast: false
29+
matrix:
30+
service: [airlock, registry, tool-firewall, gpu-integrity-watch, mcp-firewall, policy-engine, runtime-attestor, integrity-monitor, incident-recorder]
31+
steps:
32+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
33+
- uses: actions/setup-go@d35c59abb061a4a6fb18e82ac0862c26744d6ab5 # v5.5.0
34+
with:
35+
go-version: "1.23"
36+
cache-dependency-path: services/${{ matrix.service }}/go.sum
37+
38+
- name: Build (linux/amd64)
39+
working-directory: services/${{ matrix.service }}
40+
run: |
41+
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
42+
go build -ldflags="-s -w -X main.version=${{ github.ref_name }}" \
43+
-o ../../dist/${{ matrix.service }}-linux-amd64 .
44+
45+
- name: Build (linux/arm64)
46+
working-directory: services/${{ matrix.service }}
47+
run: |
48+
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 \
49+
go build -ldflags="-s -w -X main.version=${{ github.ref_name }}" \
50+
-o ../../dist/${{ matrix.service }}-linux-arm64 .
51+
52+
- name: Generate SBOM (Syft)
53+
uses: anchore/sbom-action@17ae1740179002c89186b61233e0f892c3118b11 # v0.23.0
54+
with:
55+
path: services/${{ matrix.service }}
56+
format: cyclonedx-json
57+
output-file: dist/${{ matrix.service }}-sbom.cdx.json
58+
59+
- name: Upload artifacts
60+
uses: actions/upload-artifact@ea165f8d65b6db9a6b7e75b195508afaf57ec3c7 # v4.6.2
61+
with:
62+
name: go-${{ matrix.service }}
63+
path: dist/
64+
65+
build-python:
66+
name: Build Python Service SBOMs
67+
runs-on: ubuntu-latest
68+
steps:
69+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
70+
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
71+
with:
72+
python-version: "3.12"
73+
74+
- name: Install Syft
75+
run: curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin
76+
77+
- name: Generate Python SBOMs
78+
run: |
79+
mkdir -p dist
80+
for svc in agent ui quarantine common; do
81+
if [ -d "services/${svc}" ]; then
82+
syft dir:services/${svc} -o cyclonedx-json=dist/${svc}-sbom.cdx.json
83+
fi
84+
done
85+
# Diffusion worker and search mediator
86+
for svc in diffusion-worker search-mediator; do
87+
if [ -d "services/${svc}" ]; then
88+
syft dir:services/${svc} -o cyclonedx-json=dist/${svc}-sbom.cdx.json
89+
fi
90+
done
91+
92+
- name: Upload artifacts
93+
uses: actions/upload-artifact@ea165f8d65b6db9a6b7e75b195508afaf57ec3c7 # v4.6.2
94+
with:
95+
name: python-sboms
96+
path: dist/
97+
98+
provenance:
99+
name: SLSA Provenance & Attestation
100+
runs-on: ubuntu-latest
101+
needs: [build-go, build-python]
102+
steps:
103+
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
104+
105+
- name: Download all artifacts
106+
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
107+
with:
108+
path: dist/
109+
merge-multiple: true
110+
111+
- name: Generate SHA256 checksums
112+
run: |
113+
cd dist
114+
sha256sum * > SHA256SUMS
115+
cat SHA256SUMS
116+
117+
- name: Sign checksums with cosign
118+
run: |
119+
cosign sign-blob --yes \
120+
--key env://COSIGN_PRIVATE_KEY \
121+
--output-signature dist/SHA256SUMS.sig \
122+
dist/SHA256SUMS
123+
env:
124+
COSIGN_PRIVATE_KEY: ${{ secrets.SIGNING_SECRET }}
125+
126+
- name: Attest build provenance
127+
uses: actions/attest-build-provenance@c074443f1aee8d4aeeae555aebba3282517141b2 # v2.2.3
128+
with:
129+
subject-path: "dist/*-linux-*"
130+
131+
- name: Attest SBOMs
132+
run: |
133+
for sbom in dist/*-sbom.cdx.json; do
134+
service=$(basename "$sbom" -sbom.cdx.json)
135+
cosign attest --yes --type cyclonedx \
136+
--predicate "$sbom" \
137+
--key env://COSIGN_PRIVATE_KEY \
138+
ghcr.io/${{ github.repository }}:${{ github.ref_name }}-${service} || \
139+
echo "WARN: cosign attest skipped for ${service} (no matching image)"
140+
done
141+
env:
142+
COSIGN_PRIVATE_KEY: ${{ secrets.SIGNING_SECRET }}
143+
144+
- name: Create GitHub Release
145+
if: ${{ !inputs.dry_run }}
146+
uses: softprops/action-gh-release@da05d552573ad5aba039eaac05058a918a7bf631 # v2.2.2
147+
with:
148+
files: |
149+
dist/*-linux-*
150+
dist/*-sbom.cdx.json
151+
dist/SHA256SUMS
152+
dist/SHA256SUMS.sig
153+
generate_release_notes: true
154+
fail_on_unmatched_files: false

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,11 +89,17 @@ journalctl -u secure-ai-quarantine-watcher -f # watch pipeline
8989
| Diffusion Worker | 8455 | Python | Image and video generation |
9090
| Agent | 8476 | Python | Policy-bound local autopilot (deny-by-default, capability tokens) |
9191
| Quarantine | -- | Python | 7-stage verify, scan, and promote pipeline |
92+
| GPU Integrity Watch | 8495 | Go | Continuous GPU runtime verification and anomaly detection |
93+
| MCP Firewall | 8496 | Go | Model Context Protocol policy gateway (default-deny, taint tracking) |
94+
| Policy Engine | 8500 | Go | Unified policy decision point (6 domains, decision evidence, OPA-upgradeable) |
95+
| Runtime Attestor | 8505 | Go | TPM2 quote verification, HMAC-signed state bundles, startup gating |
96+
| Integrity Monitor | 8510 | Go | Continuous baseline-verified file watcher (binaries, policies, models, trust material) |
97+
| Incident Recorder | 8515 | Go | Security event capture, incident lifecycle, auto-containment |
9298
| Search Mediator | 8485 | Python | Tor-routed web search with PII stripping |
9399
| SearXNG | 8888 | Python | Self-hosted metasearch (privacy-respecting engines) |
94100
| Tor | 9050 | C | Anonymous SOCKS5 proxy |
95101

96-
See [docs/architecture.md](docs/architecture.md) for design decisions and service dependencies. Per-service docs: [registry](docs/components/registry.md) | [tool-firewall](docs/components/tool-firewall.md) | [agent](docs/components/agent.md) | [airlock](docs/components/airlock.md) | [quarantine](docs/components/quarantine.md) | [search-mediator](docs/components/search-mediator.md)
102+
See [docs/architecture.md](docs/architecture.md) for design decisions and service dependencies. Per-service docs: [registry](docs/components/registry.md) | [tool-firewall](docs/components/tool-firewall.md) | [agent](docs/components/agent.md) | [airlock](docs/components/airlock.md) | [quarantine](docs/components/quarantine.md) | [search-mediator](docs/components/search-mediator.md) | [gpu-integrity-watch](docs/components/gpu-integrity-watch.md) | [mcp-firewall](docs/components/mcp-firewall.md) | [policy-engine](docs/components/policy-engine.md) | [runtime-attestor](docs/components/runtime-attestor.md) | [integrity-monitor](docs/components/integrity-monitor.md) | [incident-recorder](docs/components/incident-recorder.md)
97103

98104
### 7-Stage Quarantine Pipeline
99105

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# GPU Integrity Watch
2+
3+
**Service:** `secure-ai-gpu-integrity-watch.service`
4+
**Binary:** `/usr/libexec/secure-ai/gpu-integrity-watch`
5+
**Port:** 8495 (loopback only)
6+
**Language:** Go
7+
8+
## Purpose
9+
10+
Continuous GPU runtime integrity verification. Monitors the GPU hardware and driver stack to detect tampering, unexpected changes, or anomalies that could compromise model execution trust. Integrates with the runtime attestor and incident recorder for end-to-end GPU security.
11+
12+
## Architecture
13+
14+
GPU Integrity Watch runs as a daemon that periodically probes the GPU subsystem and scores results against a trusted baseline. If the score exceeds a configurable threshold, it triggers degradation actions and reports incidents.
15+
16+
```
17+
+-----------+ +-----------+ +-----------+ +-------------------+
18+
| Probes | --> | Scoring | --> | Actions | --> | Integrations |
19+
| (6 types) | | (weighted | | (degrade, | | - incident-recorder|
20+
| | | history) | | alert, | | - runtime-attestor |
21+
| | | | | disable) | | |
22+
+-----------+ +-----------+ +-----------+ +-------------------+
23+
```
24+
25+
## Probes
26+
27+
| Probe | Type | Default Weight | What It Checks |
28+
|-------|------|--------|----------------|
29+
| Tensor Hash | `tensor_hash` | 1.0 | SHA-256 of model files vs baseline |
30+
| Sentinel Inference | `sentinel_inference` | 1.0 | Known input/output pairs for behavioral consistency |
31+
| Reference Drift | `reference_drift` | 0.8 | Multi-pass variance detection (corruption signature) |
32+
| ECC Status | `ecc_status` | 0.6 | GPU memory error counters (nvidia-smi) |
33+
| Driver Fingerprint | `driver_fingerprint` | 1.0 | GPU driver version + kernel module identity vs baseline |
34+
| Device Allowlist | `device_allowlist` | 0.8 | GPU device nodes (/dev/dri/*, /dev/nvidia*) vs expected list |
35+
36+
### Verdict Classification
37+
38+
| Composite Score | Verdict |
39+
|----------------|---------|
40+
| 0.0 - 0.3 | `healthy` |
41+
| 0.3 - 0.9 | `warning` |
42+
| >= 0.9 or any probe `fail` | `critical` |
43+
44+
## Integrations
45+
46+
### Runtime Attestor
47+
48+
The `/v1/attest-state` endpoint returns a `GPUAttestState` summary that the runtime attestor can include in the signed attestation bundle:
49+
50+
```json
51+
{
52+
"timestamp": "2026-03-13T12:00:00Z",
53+
"verdict": "healthy",
54+
"composite_score": 0.0,
55+
"probe_statuses": {"hash": "pass", "driver": "pass"},
56+
"driver_version": "565.57.01",
57+
"device_nodes": ["/dev/dri/card0", "/dev/dri/renderD128"],
58+
"trend": 0.0
59+
}
60+
```
61+
62+
### Incident Recorder
63+
64+
On `warning` or `critical` verdicts, GPU Integrity Watch automatically reports incidents to the incident-recorder service (`http://127.0.0.1:8515`). Incident classes are mapped from probe failures:
65+
66+
| Probe Failure | Incident Class | Severity |
67+
|--------------|----------------|----------|
68+
| Tensor hash fail | `manifest_mismatch` | critical |
69+
| ECC uncorrected errors | `integrity_violation` | critical |
70+
| Driver fingerprint change | `integrity_violation` | high |
71+
| Device allowlist fail | `integrity_violation` | high |
72+
| Other anomalies | `model_behavior_anomaly` | high |
73+
74+
## Configuration
75+
76+
- **Profile:** `/etc/secure-ai/gpu-integrity/default-profile.yaml`
77+
- **Baseline:** `/var/lib/secure-ai/gpu-integrity/baseline.yaml`
78+
- **Audit log:** `/var/lib/secure-ai/logs/gpu-integrity-audit.jsonl`
79+
80+
### Environment Variables
81+
82+
| Variable | Default | Description |
83+
|----------|---------|-------------|
84+
| `INTEGRITY_PROFILE` | `profiles/default-profile.yaml` | Profile YAML path |
85+
| `SERVICE_TOKEN` | (none) | Bearer token for protected endpoints |
86+
| `AUDIT_LOG` | (none) | JSONL audit log path |
87+
| `INCIDENT_RECORDER_URL` | (from profile) | Override incident-recorder URL |
88+
89+
## CLI Commands
90+
91+
```bash
92+
gpu-integrity-watch check # Run probes once, exit 0/1/2
93+
gpu-integrity-watch watch # Continuous foreground monitoring
94+
gpu-integrity-watch daemon # HTTP daemon + background monitoring
95+
gpu-integrity-watch baseline # Capture baseline hashes
96+
gpu-integrity-watch status # Query daemon status
97+
```
98+
99+
## Actions
100+
101+
| Action | Type | Trigger | Effect |
102+
|--------|------|---------|--------|
103+
| Alert | `alert` | warning | Send webhook or log alert |
104+
| Reload | `reload` | warning | Signal inference server to reload model |
105+
| Quarantine | `quarantine` | critical | Move model files to quarantine directory |
106+
| Fail Closed | `fail_closed` | critical | Shut down inference server |
107+
108+
## API Endpoints
109+
110+
| Method | Path | Auth | Description |
111+
|--------|------|------|-------------|
112+
| GET | `/health` | No | Liveness check |
113+
| POST | `/v1/check` | No | Trigger full probe cycle |
114+
| GET | `/v1/status` | No | Latest verdict, trend, probes, actions |
115+
| GET | `/v1/history` | No | Score history array |
116+
| GET | `/v1/metrics` | No | Counter metrics |
117+
| GET | `/v1/attest-state` | No | GPU attestation state for runtime-attestor |
118+
| POST | `/v1/baseline` | Token | Recapture baseline from model directory |
119+
| POST | `/v1/reload` | Token | Reload profile and baseline from disk |
120+
121+
## Systemd Hardening
122+
123+
| Mechanism | Setting |
124+
|-----------|---------|
125+
| Dynamic user | `DynamicUser=yes` |
126+
| Filesystem | `ProtectSystem=strict`, `ProtectHome=yes` |
127+
| Network | `RestrictAddressFamilies=AF_UNIX AF_INET`, localhost only |
128+
| Capabilities | `CapabilityBoundingSet=` (empty) |
129+
| Memory | `MemoryDenyWriteExecute=yes` |
130+
| Seccomp | Custom seccomp-BPF profile |
131+
| Landlock | Read: `/etc/secure-ai`, `/sys/class/drm`, `/sys/bus/pci/devices`, `/dev/dri`; Write: `/var/lib/secure-ai/logs`, `/var/lib/secure-ai/gpu-integrity` |
132+
133+
## Test Coverage
134+
135+
81 tests covering:
136+
- Tensor hash probes (5 tests)
137+
- Sentinel inference/drift probes (3 tests)
138+
- ECC status parsing (5 tests)
139+
- Similarity computation (4 tests)
140+
- Scoring engine (7 tests)
141+
- Action execution (5 tests)
142+
- Integration pipeline (2 tests)
143+
- HTTP endpoints (10 tests)
144+
- Token authentication (3 tests)
145+
- Driver fingerprint probes (5 tests)
146+
- Device allowlist probes (5 tests)
147+
- Attestation state building (3 tests)
148+
- Incident classification (4 tests)
149+
- New probe integration (2 tests)
150+
- Scoring with new weights (1 test)
151+
152+
```bash
153+
cd services/gpu-integrity-watch && go test -v -race ./...
154+
```
155+
156+
## Related
157+
158+
- [Runtime Attestor](runtime-attestor.md) -- consumes GPU attest-state
159+
- [Incident Recorder](incident-recorder.md) -- receives GPU integrity incidents
160+
- [Integrity Monitor](integrity-monitor.md) -- continuous file integrity
161+
- [Architecture](../architecture.md) -- system design overview
162+
- [Threat Model](../threat-model.md) -- GPU-related threat classes

0 commit comments

Comments
 (0)