Skip to content

Commit d8165b7

Browse files
incident-management: tighten IR template structure and pipeline runbook
1 parent ebfce1e commit d8165b7

10 files changed

Lines changed: 365 additions & 112 deletions

File tree

docs/pages/incident-management/incident-response-template/incident-response-policy.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ See [Runbooks](./runbooks/overview) for step-by-step guides for specific inciden
158158
**Goal:** Confirm the fix actually worked.
159159

160160
- Verify immediately after deployment
161-
- Monitor for at least a week
161+
- Monitor based on residual risk, blast radius, and incident type
162162
- Consider adding new alerts or test cases
163163
- Document what monitoring is now in place
164164

docs/pages/incident-management/incident-response-template/overview.mdx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,17 @@ That's it to start. Add complexity only as you need it.
7878

7979
Review and adapt these pages for your own internal incident response documentation.
8080

81+
This section is different from the broader [Incident Management](/incident-management/overview) guidance:
82+
83+
- **Incident Management pages** explain concepts and practices
84+
- **Incident Response Template pages** are meant to be copied, customized, and used internally
85+
86+
Within this template section:
87+
88+
- **Policy / roles / communications / contacts** define your operating model
89+
- **Templates** are blank working documents to fill out during or after incidents
90+
- **Runbooks** are scenario-specific response procedures
91+
8192
### What's Included
8293

8394
| Document | Purpose |

docs/pages/incident-management/incident-response-template/roles-and-staffing.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ Regardless of team size, define who can make high-stakes decisions during P1 inc
168168
| | | |
169169
| | | |
170170

171-
These people should be reachable 24/7 for critical incidents. Consider:
171+
There should be a 24/7 escalation path to these people for critical incidents. Consider:
172172
- Founders / C-level
173173
- Security Lead
174174
- Engineering Lead

docs/pages/incident-management/incident-response-template/runbooks/build-pipeline-compromise.mdx

Lines changed: 142 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Runbook: Build Pipeline Compromise | Security Alliance"
3-
description: "Stub runbook. Customize with your CI/CD platform and procedures."
3+
description: "Example runbook for CI/CD compromise. Review and customize for your platform, release process, and trust boundaries before use."
44
tags:
55
- Security Specialist
66
- Operations & Strategy
@@ -21,7 +21,7 @@ import { TagList, AttributionList, TagProvider, TagFilter, ContributeFooter } fr
2121
<TagList tags={frontmatter.tags} />
2222
<AttributionList contributors={frontmatter.contributors} />
2323

24-
> **Stub runbook.** Customize with your CI/CD platform and procedures.
24+
> **This is an example runbook.** Review and customize for your CI/CD platform, artifact flow, deployment model, and approval process before use.
2525
2626
## Quick Reference
2727

@@ -37,36 +37,150 @@ import { TagList, AttributionList, TagProvider, TagFilter, ContributeFooter } fr
3737

3838
### Symptoms
3939

40-
- [ ] Unexpected code in deployed artifacts
41-
- [ ] CI/CD configuration changed without approval
42-
- [ ] Secrets accessed or exfiltrated
43-
- [ ] Unauthorized workflow runs
40+
- [ ] Unexpected workflow runs or releases
41+
- [ ] CI/CD configuration changed without expected approval
42+
- [ ] Secrets accessed, exported, or rotated unexpectedly
43+
- [ ] Build artifacts differ from expected source or prior reproducible output
44+
- [ ] Deployments reference an unexpected commit, artifact, or builder identity
45+
46+
### Likely Scope Questions
47+
48+
- Is this limited to CI configuration, or were artifacts actually produced from a compromised pipeline?
49+
- Did the pipeline have deploy permissions, signing authority, or production credentials?
50+
- Were any releases, containers, frontend bundles, or packages published during the exposure window?
51+
52+
### Differentiation
53+
54+
- Unauthorized code merged without pipeline abuse may be a repository compromise first
55+
- Malicious package updates without CI tampering may be a dependency incident first
56+
- A bad deployment from a legitimate commit may be an operational failure rather than compromise
4457

45-
### Confirm Compromise
4658

47-
- Review CI/CD audit logs
48-
- Compare build artifacts to source
49-
- Check for config changes in CI/CD platform
5059
## Immediate Actions
5160

52-
1. [ ] Disable compromised pipelines
53-
2. [ ] Rotate all secrets and tokens
54-
3. [ ] Take down potentially compromised deployments
55-
4. [ ] Audit recent builds and deployments
56-
## Mitigation
57-
58-
1. [ ] Audit CI/CD configuration for unauthorized changes
59-
2. [ ] Rebuild from trusted commit using clean pipeline
60-
3. [ ] Implement additional approval requirements
61-
4. [ ] Review and restrict pipeline permissions
62-
## Prevention
63-
64-
- [ ] Require approval for CI/CD config changes
65-
- [ ] Use short-lived credentials
66-
- [ ] Implement branch protection
67-
- [ ] Audit pipeline access regularly
68-
- [ ] Use signed commits
69-
- [ ] Separate build and deploy permissions
61+
### Step 1: Freeze the pipeline
62+
63+
**Why:** Stop additional malicious builds, releases, or secret access.
64+
65+
- [ ] Disable affected workflows/pipelines
66+
- [ ] Revoke or pause auto-deploy jobs
67+
- [ ] Block manual approvals until scope is understood
68+
69+
### Step 2: Preserve evidence
70+
71+
**Why:** CI audit logs, workflow definitions, artifact metadata, and deployment history are easy to overwrite.
72+
73+
- [ ] Export CI audit logs
74+
- [ ] Save workflow/job history for the exposure window
75+
- [ ] Record affected commits, workflow files, artifact digests, release IDs, and deployment targets
76+
- [ ] Preserve runner details if self-hosted runners were involved
77+
78+
### Step 3: Rotate credentials by blast radius
79+
80+
**Why:** Pipeline compromise often becomes credential compromise.
81+
82+
Prioritize rotation of:
83+
- [ ] CI platform tokens
84+
- [ ] cloud deploy credentials
85+
- [ ] package registry tokens
86+
- [ ] artifact signing keys or release credentials
87+
- [ ] secrets available to self-hosted runners
88+
89+
### Step 4: Stop trust in recent outputs
90+
91+
**Why:** Do not assume recent artifacts or deployments are clean.
92+
93+
- [ ] Identify all artifacts built during the exposure window
94+
- [ ] Identify all deployments and releases from those artifacts
95+
- [ ] Quarantine or withdraw suspicious outputs where possible
96+
97+
98+
## Investigation
99+
100+
### Key Questions
101+
102+
- [ ] What was the initial access path: CI platform, repository permissions, runner compromise, or stolen token?
103+
- [ ] What permissions did the compromised pipeline actually have?
104+
- [ ] Were secrets exposed only to logs/runtime, or used to publish or deploy?
105+
- [ ] Which environments were reachable: build only, staging, production?
106+
- [ ] Which outputs must now be treated as untrusted?
107+
108+
### Information to Gather
109+
110+
| Data | Source |
111+
|------|--------|
112+
| CI audit logs | CI/CD platform |
113+
| workflow/config diffs | repository history |
114+
| release/deployment history | CI/CD platform, cloud provider, registry |
115+
| artifact digests / provenance | registry, signing system, artifact store |
116+
| runner access and execution logs | runner host / CI platform |
117+
118+
119+
## Containment and Recovery
120+
121+
### Option A: Rebuild from a known-good commit using a clean pipeline
122+
123+
**When:** You can identify a trusted commit and re-establish a trusted build path.
124+
**Impact:** Release cadence slows, but trust is restored more safely.
125+
126+
1. Stand up a clean pipeline or isolated builder
127+
2. Re-verify repository state and workflow definitions
128+
3. Rebuild from a known-good commit
129+
4. Compare output metadata against expected source and release intent
130+
5. Redeploy only from the rebuilt trusted output
131+
132+
### Option B: Roll back to last known-good release
133+
134+
**When:** A trusted prior release exists and rollback is operationally safe.
135+
**Impact:** Feature loss or temporary service degradation may occur.
136+
137+
1. Identify the last trusted artifact and deployment
138+
2. Roll back affected services
139+
3. Verify rollback success in production
140+
4. Continue investigation before resuming normal release flow
141+
142+
### Option C: Keep service paused until trust is re-established
143+
144+
**When:** You cannot distinguish clean from compromised outputs.
145+
**Impact:** Operational disruption, but lower risk of serving malicious artifacts.
146+
147+
1. Pause releases/deployments
148+
2. Communicate impact internally and externally as needed
149+
3. Rebuild trust in source, pipeline, credentials, and artifacts before resuming
150+
151+
152+
## Verification Before Resuming
153+
154+
Do not resume normal delivery until you can answer these clearly:
155+
156+
- [ ] The initial access path is understood well enough to prevent immediate recurrence
157+
- [ ] Compromised credentials have been rotated or invalidated
158+
- [ ] Untrusted artifacts and releases have been identified and handled
159+
- [ ] Build and deploy permissions are re-scoped appropriately
160+
- [ ] A known-good artifact has been rebuilt or a known-good release has been restored
161+
162+
163+
## Hardening After the Incident
164+
165+
- [ ] Separate build permissions from deploy permissions
166+
- [ ] Require stronger approval controls for workflow and release changes
167+
- [ ] Use short-lived credentials where possible
168+
- [ ] Reduce secret exposure to only the jobs that need them
169+
- [ ] Restrict or harden self-hosted runners if used
170+
- [ ] Improve artifact provenance, signing, and release verification
171+
172+
173+
## Escalation
174+
175+
Escalate immediately if:
176+
- [ ] production deployments may have been modified
177+
- [ ] signing keys or release credentials may be exposed
178+
- [ ] user-facing artifacts may have been maliciously published
179+
- [ ] the pipeline had access to broader cloud or infrastructure credentials
180+
181+
See [Contacts](../contacts) and [Incident Response Policy](../incident-response-policy).
182+
183+
70184
## Related
71185

72186
- [Frontend Compromise](./frontend-compromise)

0 commit comments

Comments
 (0)