This document describes a manual, AI-executable integration test flow for KMS self-authorization and quote-required KMS behavior.
The goal is to validate the following behaviors without depending on kms/e2e/ from PR #538:
- Bootstrap self-check: a KMS must call the auth API and verify that itself is allowed before bootstrap succeeds.
- Onboard receiver-side source check: a new KMS must reject onboarding if the source KMS is not allowed by the receiver's auth policy.
- Trusted RPC self-check: trusted KMS RPCs such as
GetTempCaCert,GetKmsKey,GetAppKey, andSignCertmust fail when the running KMS is no longer allowed by its auth policy. - Attestation requirement: KMS always requires attestation; for local development without TDX hardware, use
sdk/simulator.
This guide is written as a deployment-and-test runbook so an AI agent can follow it end-to-end.
Execution notes from real runs on teepod2 (2026-03-19):
- Do not assume a host-local
auth-simpleinstance is reachable from a CVM. In practice, the auth API must be:
- publicly reachable by the CVM, or
- deployed as a sidecar/internal service inside the same test environment.
- dstack CVMs use QEMU user-mode networking — the host is reachable at
10.0.2.2from inside the CVM.- For PR validation, prefer a prebuilt KMS test image.
Boot Progress: doneonly means the VM guest boot finished. It does not guarantee the KMS onboard endpoint is already ready.- If you inject helper scripts through
docker-compose.yaml, prefer inlineconfigs.contentoverconfigs.fileunless you have confirmed the extra files are copied into the deployment bundle.- The onboard completion endpoint is GET
/finish, not POST.- Do not reuse a previously captured
mr_aggregatedacross redeploys. Auth policies must be generated from the attestation of the current VM under test.- KMS now always requires quote/attestation. For local development without TDX hardware, use
sdk/simulatorinstead of trying to run a no-attestation KMS flow.- For
auth-simple,kms.mrAggregated = []is a deny-all policy for KMS. Use that as the baseline deny configuration, then add the measured KMS MR values for allow cases.- Port forwarding is simpler than gateway for testing. Using
--gatewayrequires the auth API to return a validgatewayAppId, which adds unnecessary complexity. Use--port tcp:0.0.0.0:<host-port>:8000instead.Remote KMS attestation has an emptyFixed: RA-TLS certs now use the unifiedosImageHash.PHALA_RATLS_ATTESTATIONformat which preservesvm_config. For old source KMS instances that still use the legacy cert format, the receiver-sideensure_kms_allowedautomatically fillsosImageHashfrom the local KMS's own value. No special"0x"entry inosImagesis needed anymore.- The
source_urlin theOnboard.Onboardrequest must use an address reachable from inside the CVM (e.g.,https://10.0.2.2:<port>/prpc), not127.0.0.1which is the CVM's own loopback.
- Why this document exists
- Test strategy
- Topology
- Prerequisites
- Shared setup
- Test case 1: bootstrap is denied when self is not allowed
- Test case 2: bootstrap succeeds after self is whitelisted
- Test case 3: receiver rejects onboarding from a denied source KMS
- Test case 4: trusted RPCs fail when the running KMS is no longer allowed
- Test case 5: local development should use the simulator
- Evidence to capture
- Cleanup
This guide provides a standalone test procedure that does not depend on a dedicated e2e framework. It uses:
- existing KMS deploy flows
auth-simpleas a controllable auth API- manual RPC calls via
curl
This exercises real deployment paths with minimal dependencies.
Use real KMS CVMs with a hot-reloadable auth-simple policy.
Why auth-simple:
- it implements the same
/bootAuth/kmswebhook contract used by KMS - its config is re-read on every request
- allow/deny behavior can be changed without restarting the service
The test intentionally focuses on authorization decisions, not on a new Rust test harness.
Use the following layout:
Host / operator machine
├── auth-simple-src (source KMS auth policy)
├── auth-simple-dst (target KMS auth policy)
├── kms-src (bootstrapped, later used as source KMS)
├── kms-dst (fresh KMS used for onboard tests)
Policy responsibilities:
auth-simple-srcmust authorize:kms-srcitself, for bootstrap and trusted RPC self-checkskms-dst, whenkms-dstcallsGetKmsKeyduring onboarding
auth-simple-dstdecides whetherkms-dstacceptskms-srcas an allowed source KMS
Before starting, make sure the following are available:
- A KMS image built from current
master(includes PR #573 auth checks, #579 mandatory attestation, #581 dedup refactor) - A working
dstack-vmmor teepod deployment target - Two routable KMS onboard URLs
buninstalled on the host, becausekms/auth-simpleruns on Bunjq,curl, and Python 3 on the host
Recommended references:
- KMS deployment tutorial:
docs/tutorials/kms-cvm-deployment.md - KMS troubleshooting:
docs/tutorials/troubleshooting-kms-deployment.md auth-simpleusage:kms/auth-simple/README.md
If deploying on teepod/dstack-vmm, the easiest pattern is:
- deploy KMS in onboard mode
- expose the onboard page through gateway
- call
/prpc/Onboard.*?jsonvia HTTPS
Strong recommendation for this manual test:
- publish a test KMS image first, then deploy that image
- avoid
build:indocker-compose.yamlunless you have already confirmed image builds work correctly in your VMM environment
Using a prebuilt image significantly reduces ambiguity when a failure happens: you can focus on KMS authorization logic rather than image build or registry behavior.
If you use teepod gateway instead of port forwarding:
- onboard mode: use the
-8000style URL (plain HTTP) - runtime TLS KMS RPC after bootstrap/onboard: use the
-8000sstyle URL (TLS passthrough)
Do not assume the same external URL works before and after onboarding is finished.
export REPO_ROOT="$(git rev-parse --show-toplevel)"
mkdir -p /tmp/kms-self-auth
cd /tmp/kms-self-authThe original plan was to run two host-local auth-simple processes. In practice, this only works if the CVMs can reach that host directly.
Choose one of these options:
- Preferred: run
auth-simpleon the operator host and point KMS athttp://10.0.2.2:<port>(QEMU host gateway). This is the simplest if the CVMs use QEMU user-mode networking. - Also fine: deploy the auth API as a separate public service or CVM
- Sidecar: run the auth API as a sidecar in the same KMS test deployment
If you use the sidecar/public-service pattern, keep the same logical split:
- source-side auth policy
- destination-side auth policy
and make sure you still have a way to update allow/deny policy during the test.
cd "$REPO_ROOT/kms/auth-simple"
bun installCreate placeholder configs:
cat > /tmp/kms-self-auth/auth-src.json <<'EOF'
{
"osImages": [],
"gatewayAppId": "any",
"kms": {
"mrAggregated": [],
"devices": [],
"allowAnyDevice": true
},
"apps": {}
}
EOF
cat > /tmp/kms-self-auth/auth-dst.json <<'EOF'
{
"osImages": [],
"gatewayAppId": "any",
"kms": {
"mrAggregated": [],
"devices": [],
"allowAnyDevice": true
},
"apps": {}
}
EOFThese placeholder configs intentionally deny all KMS boots until you populate kms.mrAggregated with the measured source or destination KMS values for the current run.
Start the services:
cd "$REPO_ROOT/kms/auth-simple"
AUTH_CONFIG_PATH=/tmp/kms-self-auth/auth-src.json PORT=3101 bun run start \
>/tmp/kms-self-auth/auth-src.log 2>&1 &
echo $! >/tmp/kms-self-auth/auth-src.pid
AUTH_CONFIG_PATH=/tmp/kms-self-auth/auth-dst.json PORT=3102 bun run start \
>/tmp/kms-self-auth/auth-dst.log 2>&1 &
echo $! >/tmp/kms-self-auth/auth-dst.pidHealth check:
curl -sf http://127.0.0.1:3101/ | jq .
curl -sf http://127.0.0.1:3102/ | jq .Deploy two KMS CVMs using the existing KMS deployment workflow.
Requirements for both VMs:
core.onboard.enabled = truecore.onboard.auto_bootstrap_domain = ""core.auth_api.type = "webhook"
Point them at different auth services. If using host-local auth-simple with QEMU user-mode networking:
kms-src→http://10.0.2.2:3101kms-dst→http://10.0.2.2:3102
Recommended deploy method: use port forwarding (--port) instead of gateway. Gateway requires the auth API to return a gatewayAppId at boot, which makes testing harder. With port forwarding, the KMS onboard and runtime endpoints are directly accessible on the host:
vmm-cli.py deploy --name kms-src ... --port tcp:0.0.0.0:9301:8000
vmm-cli.py deploy --name kms-dst ... --port tcp:0.0.0.0:9302:8000If you need an example deployment template, adapt the flow in:
docs/tutorials/kms-cvm-deployment.md
Record these values:
# With port forwarding:
export KMS_SRC_ONBOARD='http://127.0.0.1:9301'
export KMS_DST_ONBOARD='http://127.0.0.1:9302'
export KMS_SRC_RUNTIME='https://127.0.0.1:9301'
export KMS_DST_RUNTIME='https://127.0.0.1:9302'Notes:
- The onboard endpoint serves plain HTTP, so use
http://forKMS_*_ONBOARD - After bootstrap/onboard +
/finish, the KMS restarts with TLS — usehttps://forKMS_*_RUNTIME - The
source_urlinOnboard.Onboardmust be reachable from inside the CVM (e.g.,https://10.0.2.2:9301/prpc)
Wait until the onboard endpoint is actually ready before continuing. A simple probe loop is recommended:
until curl -sk -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' -d '{}' >/dev/null 2>&1; do
echo "waiting for kms-src onboard endpoint..."
sleep 10
done
until curl -sk -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' -d '{}' >/dev/null 2>&1; do
echo "waiting for kms-dst onboard endpoint..."
sleep 10
donecurl -sf -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' \
-d '{}' | tee /tmp/kms-self-auth/kms-src-attestation.json | jq .
curl -sf -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' \
-d '{}' | tee /tmp/kms-self-auth/kms-dst-attestation.json | jq .Expected fields:
device_idmr_aggregatedos_image_hashattestation_mode
Extract values:
SRC_OS=$(jq -r '.os_image_hash' /tmp/kms-self-auth/kms-src-attestation.json)
SRC_MR=$(jq -r '.mr_aggregated' /tmp/kms-self-auth/kms-src-attestation.json)
SRC_DEV=$(jq -r '.device_id' /tmp/kms-self-auth/kms-src-attestation.json)
DST_OS=$(jq -r '.os_image_hash' /tmp/kms-self-auth/kms-dst-attestation.json)
DST_MR=$(jq -r '.mr_aggregated' /tmp/kms-self-auth/kms-dst-attestation.json)
DST_DEV=$(jq -r '.device_id' /tmp/kms-self-auth/kms-dst-attestation.json)All three values above are expected to be hex strings without the 0x prefix. When writing auth-simple config, prepend 0x.
Use a wrong mrAggregated value while allowing the observed OS image.
cat > /tmp/kms-self-auth/deny-by-mr.json <<'EOF'
{
"osImages": ["0xREPLACE_OS"],
"gatewayAppId": "any",
"kms": {
"mrAggregated": ["0x0000000000000000000000000000000000000000000000000000000000000000"],
"devices": [],
"allowAnyDevice": true
},
"apps": {}
}
EOFcat > /tmp/kms-self-auth/allow-single.json <<'EOF'
{
"osImages": ["0xREPLACE_OS"],
"gatewayAppId": "any",
"kms": {
"mrAggregated": ["0xREPLACE_MR"],
"devices": [],
"allowAnyDevice": true
},
"apps": {}
}
EOFcat > /tmp/kms-self-auth/allow-src-and-dst.json <<'EOF'
{
"osImages": ["0xREPLACE_SRC_OS", "0xREPLACE_DST_OS"],
"gatewayAppId": "any",
"kms": {
"mrAggregated": ["0xREPLACE_SRC_MR", "0xREPLACE_DST_MR"],
"devices": [],
"allowAnyDevice": true
},
"apps": {}
}
EOFCreate concrete variants:
sed "s/REPLACE_OS/$SRC_OS/g; s/REPLACE_MR/$SRC_MR/g" \
/tmp/kms-self-auth/allow-single.json \
>/tmp/kms-self-auth/auth-src-allow-self.json
sed "s/REPLACE_OS/$SRC_OS/g" \
/tmp/kms-self-auth/deny-by-mr.json \
>/tmp/kms-self-auth/auth-src-deny-self.json
sed "s/REPLACE_SRC_OS/$SRC_OS/g; s/REPLACE_DST_OS/$DST_OS/g; s/REPLACE_SRC_MR/$SRC_MR/g; s/REPLACE_DST_MR/$DST_MR/g" \
/tmp/kms-self-auth/allow-src-and-dst.json \
>/tmp/kms-self-auth/auth-src-allow-both.json
sed "s/REPLACE_OS/$SRC_OS/g; s/REPLACE_MR/$SRC_MR/g" \
/tmp/kms-self-auth/allow-single.json \
>/tmp/kms-self-auth/auth-dst-allow-src.json
sed "s/REPLACE_OS/$SRC_OS/g" \
/tmp/kms-self-auth/deny-by-mr.json \
>/tmp/kms-self-auth/auth-dst-deny-src.jsonBecause auth-simple hot reloads its config on every request, switching policy is just a file copy:
cp /tmp/kms-self-auth/auth-src-deny-self.json /tmp/kms-self-auth/auth-src.json
cp /tmp/kms-self-auth/auth-src-allow-self.json /tmp/kms-self-auth/auth-src.json
cp /tmp/kms-self-auth/auth-src-allow-both.json /tmp/kms-self-auth/auth-src.json
cp /tmp/kms-self-auth/auth-dst-deny-src.json /tmp/kms-self-auth/auth-dst.json
cp /tmp/kms-self-auth/auth-dst-allow-src.json /tmp/kms-self-auth/auth-dst.jsonVerify that a KMS refuses bootstrap if the auth API denies its own measurements.
- Make sure
kms-srcis still fresh and not bootstrapped yet. - Apply the deny-self policy to
auth-simple-src:
cp /tmp/kms-self-auth/auth-src-deny-self.json /tmp/kms-self-auth/auth-src.json- Call bootstrap:
curl -sf -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.Bootstrap?json" \
-H 'Content-Type: application/json' \
-d '{"domain":"kms-src.example.test"}' \
| tee /tmp/kms-self-auth/bootstrap-src-denied.json | jq .- the response contains
.error - the error should indicate bootstrap was denied because the KMS is not allowed
Acceptable examples:
KMS is not allowed to bootstrapBoot denied: ...
If bootstrap succeeds under the deny policy, the self-check is not working.
Verify that bootstrap succeeds once the same KMS is explicitly allowed.
- Switch
auth-simple-srcto allowkms-src:
cp /tmp/kms-self-auth/auth-src-allow-self.json /tmp/kms-self-auth/auth-src.json- Retry bootstrap:
curl -sf -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.Bootstrap?json" \
-H 'Content-Type: application/json' \
-d '{"domain":"kms-src.example.test"}' \
| tee /tmp/kms-self-auth/bootstrap-src-allowed.json | jq .- Finish onboarding mode so the process can restart into normal TLS KMS mode:
curl -sf "${KMS_SRC_ONBOARD%/}/finish"- Wait for the runtime KMS endpoint to become available and record it as:
export KMS_SRC_RUNTIME='https://<kms-src-runtime-host>'On teepod-style deployments, this is often the -8000s URL rather than the original onboard -8000 URL.
- Probe runtime metadata:
curl -sk "${KMS_SRC_RUNTIME%/}/prpc/KMS.GetMeta?json" \
| tee /tmp/kms-self-auth/kms-src-meta.json | jq .- bootstrap returns
ca_pubkey,k256_pubkey, andattestation /finishreturnsOKKMS.GetMetasucceeds after restart
Verify that the onboarding receiver rejects a source KMS whose attestation is denied by the receiver's auth API.
For this scenario to reach the receiver-side check:
auth-simple-srcmust allow bothkms-srcandkms-dstkms-srcmust allow itself, because trusted RPC self-checks run on the sourcekms-srcmust also allowkms-dst, becauseGetKmsKeyverifies the caller KMS
auth-simple-dstmust initially denykms-src
- Apply source policy that allows both KMS instances:
cp /tmp/kms-self-auth/auth-src-allow-both.json /tmp/kms-self-auth/auth-src.json- Apply receiver policy that denies
kms-src:
cp /tmp/kms-self-auth/auth-dst-deny-src.json /tmp/kms-self-auth/auth-dst.json- Attempt onboarding from
kms-dst:
curl -sf -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.Onboard?json" \
-H 'Content-Type: application/json' \
-d "{\"source_url\":\"${KMS_SRC_RUNTIME%/}/prpc\",\"domain\":\"kms-dst.example.test\"}" \
| tee /tmp/kms-self-auth/onboard-dst-denied.json | jq .- the response contains
.error - the error should indicate the source KMS is not allowed, or onboarding failed because source authorization was denied
- Switch receiver policy to allow
kms-src:
cp /tmp/kms-self-auth/auth-dst-allow-src.json /tmp/kms-self-auth/auth-dst.json- Retry onboarding:
curl -sf -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.Onboard?json" \
-H 'Content-Type: application/json' \
-d "{\"source_url\":\"${KMS_SRC_RUNTIME%/}/prpc\",\"domain\":\"kms-dst.example.test\"}" \
| tee /tmp/kms-self-auth/onboard-dst-allowed.json | jq .- Finish onboarding mode on
kms-dst:
curl -sf "${KMS_DST_ONBOARD%/}/finish"- Wait for the runtime endpoint and record:
export KMS_DST_RUNTIME='https://<kms-dst-runtime-host>'Again, when TLS passthrough is used, prefer the -8000s URL for runtime KMS RPCs.
- Probe runtime metadata:
curl -sk "${KMS_DST_RUNTIME%/}/prpc/KMS.GetMeta?json" \
| tee /tmp/kms-self-auth/kms-dst-meta.json | jq .- first onboard attempt is rejected
- second onboard attempt succeeds
kms-dststarts normally after/finish
Verify that a running KMS re-checks its own authorization on trusted RPCs.
Use GetTempCaCert first. It is simpler than GetAppKey because it does not require preparing an attested app client, but it still exercises the new runtime self-check.
- While
kms-srcis healthy, confirm the canary RPC works:
curl -sk "${KMS_SRC_RUNTIME%/}/prpc/KMS.GetTempCaCert?json" \
| tee /tmp/kms-self-auth/get-temp-ca-allowed.json | jq .- Flip
auth-simple-srcto denykms-srcitself:
cp /tmp/kms-self-auth/auth-src-deny-self.json /tmp/kms-self-auth/auth-src.json- Retry the same RPC:
curl -sk "${KMS_SRC_RUNTIME%/}/prpc/KMS.GetTempCaCert?json" \
| tee /tmp/kms-self-auth/get-temp-ca-denied.json | jq .- before the policy flip:
GetTempCaCertsucceeds - after the policy flip: the response contains
.error - the error should indicate KMS self-authorization failed, or that the KMS is not allowed
If you already have tooling for attested app/KMS clients, also verify:
GetKmsKeyfails when source KMS denies itselfGetAppKeyfails when KMS denies itselfSignCertfails when KMS denies itself
The important part is that the running KMS must not rely only on bootstrap-time authorization.
KMS now always requires attestation. For local development without TDX hardware, use sdk/simulator so bootstrap, onboard, and trusted RPC flows still exercise the quoted path.
- Start the simulator:
cd dstack/sdk/simulator
./build.sh
./dstack-simulator- Point the guest agent client at the simulator endpoint as documented in the SDK README.
- Run KMS locally against the simulator-backed guest agent.
- Verify bootstrap and trusted RPCs still produce attestation-backed behavior.
- local development still uses the same quote-required logic
- there is no separate no-quote KMS mode to validate anymore
- simulator-backed development should be treated as the replacement for the old noquote/dev workflow
For each run, save:
Onboard.GetAttestationInfooutput for every KMS- auth config snapshots used for each step
- bootstrap/onboard RPC responses
KMS.GetMetaoutput after successful bootGetTempCaCertallow/deny responses- relevant CVM logs if a step fails unexpectedly
Recommended archive:
tar czf /tmp/kms-self-auth-results.tar.gz /tmp/kms-self-authStop local auth services:
kill "$(cat /tmp/kms-self-auth/auth-src.pid)" || true
kill "$(cat /tmp/kms-self-auth/auth-dst.pid)" || trueThen remove test CVMs using your normal vmm-cli.py remove or teepod cleanup flow.
The change is considered validated if all of the following are true:
- bootstrap fails under deny policy
- bootstrap succeeds after self allowlisting
- onboarding rejects a denied source KMS on the receiver side
- runtime trusted RPCs stop working after the source KMS is removed from the allowlist
- local development without TDX hardware is expected to use
sdk/simulatorrather than a no-quote KMS mode