This document describes a manual, AI-executable integration flow for validating:
- KMS bootstrap
- KMS onboard from an existing KMS
- post-onboard trusted runtime RPCs
It is intentionally written as a deployment runbook so an AI agent can execute it step by step on teepod / dstack-vmm without depending on kms/e2e/.
This guide covers the normal happy-path flow:
- deploy
kms-src - bootstrap
kms-src - finish
kms-src - deploy
kms-dst - onboard
kms-dstfromkms-src - finish
kms-dst - probe trusted runtime RPCs on the running KMS
It also includes a compact deny-case matrix for common service-rejection paths so a deployment run can validate both success and failure behavior in one pass.
For a deeper authorization-focused runbook, also see:
tests/docs/kms-self-authorization.md
Host / operator machine
├── auth-simple-src (policy for source KMS)
├── auth-simple-dst (policy for destination KMS)
├── kms-src (bootstrapped first)
└── kms-dst (onboarded from kms-src)
Both KMS instances are expected to run with attestation enabled. For local development without TDX hardware, use sdk/simulator.
Policy reminder:
- source-side auth must allow:
kms-srcitselfkms-dstwhen it callsGetKmsKeyduring onboarding
- destination-side auth must allow:
kms-srcduring onboardingkms-dstitself before you probe trusted runtime RPCs onkms-dst
Before starting, make sure the following are available:
- a KMS image or branch containing the code under test
- a working teepod / dstack-vmm target
- routable HTTPS entrypoints for onboard and runtime RPC
curl,jq, Python 3, andbun- an auth service such as
kms/auth-simple, or an equivalent webhook
Recommended references:
docs/tutorials/kms-cvm-deployment.mddocs/tutorials/troubleshooting-kms-deployment.mdkms/auth-simple/README.mdtests/docs/kms-self-authorization.md
Operational notes:
- Prefer a prebuilt KMS image.
Boot Progress: donedoes not guarantee the onboard endpoint is ready.- The onboarding completion endpoint is GET
/finish. - On teepod with gateway, onboard mode usually uses the
-8000URL, while runtime TLS KMS RPC usually uses the-8000sURL. Port forwarding (--port tcp:0.0.0.0:<host-port>:8000) is simpler than gateway for testing, because gateway requires the auth API to return agatewayAppIdat boot time. - If you use a very small custom webhook instead of the real auth service,
KMS.GetMetamay fail becauseauth_api.get_info()expects extra chain / contract metadata fields. In that case, useGetTempCaCertas the runtime readiness probe. - dstack CVMs use QEMU user-mode networking — the host is reachable at
10.0.2.2from inside the CVM. Thesource_urlinOnboard.Onboardmust use a CVM-reachable address (e.g.,https://10.0.2.2:<port>/prpc), not127.0.0.1. Remote KMS attestation has an emptyFixed: RA-TLS certs now use the unifiedosImageHash.PHALA_RATLS_ATTESTATIONformat which preservesvm_config. For old source KMS instances, the receiver-side check fillsosImageHashfrom the local KMS's own value automatically. No special"0x"entry inosImagesis needed anymore.
export REPO_ROOT="$(git rev-parse --show-toplevel)"
mkdir -p /tmp/kms-bootstrap-onboard
cd /tmp/kms-bootstrap-onboardUse two independently controllable auth services:
- one for
kms-src - one for
kms-dst
They can be:
- Preferred: host-local, accessed from CVMs via
http://10.0.2.2:<port>(QEMU host gateway) - public services
- sidecars inside each KMS deployment
At minimum, both policies must allow the KMS instance they serve. During onboard, source-side policy must also allow the destination KMS caller.
For auth-simple, kms.mrAggregated = [] is a deny-all policy for KMS. Add the current KMS MR values explicitly when switching a test from deny to allow.
You no longer need "0x" in the osImages array — the receiver-side check now resolves osImageHash automatically.
Deploy both KMS instances in onboard mode with:
core.onboard.enabled = truecore.onboard.auto_bootstrap_domain = ""core.auth_api.type = "webhook"
Record:
export KMS_SRC_ONBOARD='https://<kms-src-onboard-host>/'
export KMS_DST_ONBOARD='https://<kms-dst-onboard-host>/'Wait until the onboard endpoints actually respond:
until curl -sk -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' -d '{}' >/dev/null 2>&1; do
echo "waiting for kms-src onboard endpoint..."
sleep 10
done
until curl -sk -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' -d '{}' >/dev/null 2>&1; do
echo "waiting for kms-dst onboard endpoint..."
sleep 10
doneCapture initial attestation info:
curl -sk -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' -d '{}' \
| tee /tmp/kms-bootstrap-onboard/kms-src-att.json | jq .
curl -sk -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.GetAttestationInfo?json" \
-H 'Content-Type: application/json' -d '{}' \
| tee /tmp/kms-bootstrap-onboard/kms-dst-att.json | jq .curl -sk -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.Bootstrap?json" \
-H 'Content-Type: application/json' \
-d '{"domain":"kms-src.example.test"}' \
| tee /tmp/kms-bootstrap-onboard/kms-src-bootstrap.json | jq .- response contains:
ca_pubkeyk256_pubkeyattestation
- no
.error
curl -sk "${KMS_SRC_ONBOARD%/}/finish" \
| tee /tmp/kms-bootstrap-onboard/kms-src-finish.txtexport KMS_SRC_RUNTIME='https://<kms-src-runtime-host>'On teepod, this is typically the -8000s style URL.
curl -sk "${KMS_SRC_RUNTIME%/}/prpc/KMS.GetMeta?json" \
| tee /tmp/kms-bootstrap-onboard/kms-src-meta.json | jq .KMS.GetMetasucceeds when the configured auth service implementsauth_api.get_info()-compatible fields- returned metadata includes:
ca_certk256_pubkeybootstrap_info
If KMS.GetMeta fails because your minimal webhook does not return chain / contract info, use GetTempCaCert below as the runtime readiness probe instead.
Before this step:
- destination-side auth must allow
kms-src - source-side auth must allow
kms-dstto callGetKmsKey - if you plan to probe trusted runtime RPCs on
kms-dstimmediately after onboard, destination-side auth must also allowkms-dstitself
curl -sk -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.Onboard?json" \
-H 'Content-Type: application/json' \
-d "{\"source_url\":\"${KMS_SRC_RUNTIME%/}/prpc\",\"domain\":\"kms-dst.example.test\"}" \
| tee /tmp/kms-bootstrap-onboard/kms-dst-onboard.json | jq .- response is
{}or otherwise empty success - no
.error
curl -sk "${KMS_DST_ONBOARD%/}/finish" \
| tee /tmp/kms-bootstrap-onboard/kms-dst-finish.txtexport KMS_DST_RUNTIME='https://<kms-dst-runtime-host>'Again, on teepod this is usually the -8000s style URL.
curl -sk "${KMS_DST_RUNTIME%/}/prpc/KMS.GetMeta?json" \
| tee /tmp/kms-bootstrap-onboard/kms-dst-meta.json | jq .KMS.GetMetasucceeds when the configured auth service implementsauth_api.get_info()-compatible fieldskms-dstnow serves as a normal runtime KMS
If KMS.GetMeta fails because your minimal webhook does not return chain / contract info, continue with the trusted RPC probes below. Those are the better canary for this manual flow.
This section folds the runtime trusted-RPC verification into the same flow.
| Case | Policy change | Expected failure point | Typical error shape |
|---|---|---|---|
| bootstrap deny | source-side auth leaves kms.mrAggregated empty or omits the current kms-src MR |
Onboard.Bootstrap on kms-src |
KMS is not allowed to bootstrap, MR aggregated not allowed |
| onboard deny (receiver-side) | destination-side auth leaves kms.mrAggregated empty or omits the current kms-src MR |
Onboard.Onboard on kms-dst |
source KMS not allowed / onboarding failed |
| onboard deny (source-side) | source-side auth leaves kms.mrAggregated empty or omits the current kms-dst MR |
Onboard.Onboard on kms-dst |
source rejected destination caller / GetKmsKey authorization failed |
| runtime deny | auth removes the running KMS from kms.mrAggregated |
GetTempCaCert or another trusted RPC |
KMS self authorization failed, KMS is not allowed |
Use the happy-path steps below first, then flip policies one by one and rerun the indicated probe.
curl -sk "${KMS_SRC_RUNTIME%/}/prpc/KMS.GetTempCaCert?json" \
| tee /tmp/kms-bootstrap-onboard/kms-src-get-temp-ca.json | jq .Expected result:
- success
- response contains:
temp_ca_certtemp_ca_keyca_cert
This RPC is normally exercised by onboard itself, but you can also treat a successful onboard as proof that:
- source KMS accepted the destination KMS as an attested caller
- source KMS returned its shared keys
If you want a standalone explicit probe, use an attested KMS client path and call:
KMS.GetKmsKey
Expected result:
- succeeds only for an attested / authorized KMS caller
This requires an attested app caller plus valid vm_config.
Expected result:
- success for an attested and authorized app caller
- returned fields should include app key material and
gateway_app_id
This requires a valid CSR plus verified attestation.
Expected result:
- success for a valid attested app CSR
- returned
certificate_chainis non-empty
After a normal happy-path run, flip source-side auth policy to deny kms-src itself and retry:
curl -sk "${KMS_SRC_RUNTIME%/}/prpc/KMS.GetTempCaCert?json" \
| tee /tmp/kms-bootstrap-onboard/kms-src-get-temp-ca-after-deny.json | jq .Expected result:
- trusted runtime RPCs fail after the KMS is no longer authorized
This overlaps with kms-self-authorization.md, but is useful as a quick post-deploy sanity check.
To make this flow more robust, add these negative checks to the same run and save each failure response as evidence.
Before the successful bootstrap run, configure source-side auth so that kms-src is not allowlisted by MR (for example, leave kms.mrAggregated empty), then call:
curl -sk -X POST "${KMS_SRC_ONBOARD%/}/prpc/Onboard.Bootstrap?json" \
-H 'Content-Type: application/json' \
-d '{"domain":"kms-src.example.test"}' \
| tee /tmp/kms-bootstrap-onboard/kms-src-bootstrap-denied.json | jq .Expected result:
- response contains
.error - error indicates the KMS itself is not allowed to bootstrap
Then allowlist kms-src and rerun the normal bootstrap flow.
Before the successful onboard run, make destination-side policy leave kms-src out of kms.mrAggregated (for example, keep it empty), then call:
curl -sk -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.Onboard?json" \
-H 'Content-Type: application/json' \
-d "{\"source_url\":\"${KMS_SRC_RUNTIME%/}/prpc\",\"domain\":\"kms-dst.example.test\"}" \
| tee /tmp/kms-bootstrap-onboard/kms-dst-onboard-denied.json | jq .Expected result:
- response contains
.error - the error indicates the receiver refused the source KMS, source authorization failed, or onboarding failed before keys were accepted
Then restore destination-side allowlists.
Make source-side policy leave kms-dst out of kms.mrAggregated, then call the same onboard request again:
curl -sk -X POST "${KMS_DST_ONBOARD%/}/prpc/Onboard.Onboard?json" \
-H 'Content-Type: application/json' \
-d "{\"source_url\":\"${KMS_SRC_RUNTIME%/}/prpc\",\"domain\":\"kms-dst.example.test\"}" \
| tee /tmp/kms-bootstrap-onboard/kms-dst-onboard-denied-by-src.json | jq .Expected result:
- response contains
.error - the error indicates the source KMS rejected the destination KMS caller, or
GetKmsKeyauthorization failed
Then restore both source-side and destination-side allowlists and rerun the normal onboard flow.
After a successful bootstrap or onboard, remove the running KMS's own MR from kms.mrAggregated and retry:
curl -sk "${KMS_SRC_RUNTIME%/}/prpc/KMS.GetTempCaCert?json" \
| tee /tmp/kms-bootstrap-onboard/kms-src-get-temp-ca-denied.json | jq .Expected result:
- response contains
.error - error indicates KMS self authorization failed or the KMS is not allowed
You can repeat the same check on kms-dst after onboard by removing kms-dst from destination-side policy and retrying KMS.GetTempCaCert.
For each run, save:
Onboard.GetAttestationInfooutput for both KMS instances- bootstrap response
- onboard response
/finishresponses- runtime
KMS.GetMetaresponses - trusted RPC responses such as
GetTempCaCert - deny-case responses such as
kms-src-bootstrap-denied.json,kms-dst-onboard-denied.json,kms-dst-onboard-denied-by-src.json, andkms-src-get-temp-ca-denied.json - auth policy snapshots used during the run
Recommended archive:
tar czf /tmp/kms-bootstrap-onboard-results.tar.gz /tmp/kms-bootstrap-onboardThe flow is considered validated if all of the following are true:
kms-srcbootstrap succeedskms-srctransitions to runtime mode successfullykms-dstonboard succeeds againstkms-srckms-dsttransitions to runtime mode successfully- runtime metadata probes succeed on both KMS instances, or
GetTempCaCertsucceeds whenGetMetais unavailable with a minimal webhook - at least one trusted runtime RPC such as
GetTempCaCertsucceeds - the selected deny cases fail at the expected RPC with an authorization error
Remove the test CVMs using your normal teepod / vmm-cli.py remove flow.
If you ran host-local auth services, stop them as well.