From 5eeaa55b88a683de54a99b7bdba867ed27b4f817 Mon Sep 17 00:00:00 2001
From: Josh <joshua.krueger@dfx.swiss>
Date: Mon, 18 May 2026 17:49:28 +0200
Subject: [PATCH] =?UTF-8?q?feat(simulator):=20v0.5.0=20=E2=80=94=20BTC=20s?=
 =?UTF-8?q?cenarios=20+=20firmware=20matrix=20+=20conventional-commits=20a?=
 =?UTF-8?q?uto-tag=20(#5)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* feat(simulator): add BTC + root-fingerprint + legacy-polygon-sign scenarios

- BtcXpubZpubMainnet: BIP-84 native segwit zpub shape
- BtcAddressP2WPKHMainnet: bech32 bc1q P2WPKH derivation
- BtcAddressP2TRTaproot: bech32m bc1p P2TR derivation
- BtcSignMessageMainnet: 64-byte sig + 65-byte electrum envelope
- RootFingerprintDeterministic: pins 4c00739d (upstream fixture seed)
- EthSignLegacyPolygonMultiByteV: actually exercises CC-5 v-byte path
  (the existing chainId=137 address probe never did — addresses don't
  depend on chainId)

Plus simulator.Connect helper (extracted from cmd/bitbox-simulator-check)
so the integration test, CLI, and any future consumer share the exact
Noise XX + channel-hash-verify bring-up. Integration test now runs the
full BaselineScenarios set on every push, surfacing any firmware drift
or scenario regression at testkit CI time instead of consumer time.

Fake TS proxy: add clearCalls, ignore symbol-keyed lookups, return
undefined for then/catch/finally so awaiting the proxy does not infect
chains as thenable. quirks.test.ts now reads quirks.json directly to
stay self-consistent across releases instead of needing a hardcoded
count bump every time.

* fix(simulator): encode umlaut KYC payload as JSON \u-escapes

The umlaut-rejection scenario's payload had three literal non-ASCII bytes
(ü, ß, ü) in a raw-string Go const, which the audit's quirk-E1 regex
flagged as a critical finding when self-auditing the testkit. Encoding
them as JSON ü / ß keeps the SOURCE pure ASCII while the JSON
parser inside the BitBox SDK still resolves them to the exact same UTF-8
bytes a literal "ü" would produce — the scenario still exercises the
firmware reject path.

Removes 3 false-positive critical findings from the testkit's own
action-selftest job, and from any consumer who ever decides to point
their bitbox-audit at the testkit source tree.

* v0.5.0 prep: matrix-mode CLI + CI matrix + CHANGELOG backfill + ONBOARDING

- bitbox-simulator-check gains --firmware <name|all>; LaunchVersion +
  ErrSimulatorNotFound let any caller pin a specific embedded build.
- New CI job go-simulator-matrix drives the 14-scenario baseline against
  all 8 embedded firmwares (v9.19.0 → v9.26.1) in parallel on every push.
  Catches regressions that only surface on older firmwares still in the
  production tail — BitBox02 only auto-updates when the user opens the
  BitBoxApp.
- bitbox-simulator composite action exposes firmware: input; slash
  template parses firmware=X and ref=Y modifiers + 'fail' shorthand.
- Composite action defaults: bitbox-audit testkit-ref v0.2.0 → v0.5.0,
  bitbox-simulator v0.4.2 → v0.5.0. Workflow-templates bumped to match.
- CHANGELOG backfilled for every version between v0.3.1 and v0.4.4
  (previously only the v0.1.0/v0.2.0/v0.3.0/v0.3.1 entries existed) and
  the new v0.5.0 entry.
- ONBOARDING gains a §6 simulator section covering the 14 baseline
  scenarios, matrix mode, slash trigger, and what the simulator
  validates vs. doesn't (transport still needs a real device).

* align v0.5.0 with develop conventions

- Drop the JSON \u-escape workaround in scenarios.go; the audit-skip-file
  marker TaprootFreak added in PR #2 is the right per-file opt-out for
  intentional non-ASCII test fixtures, and matches the pattern already
  used in core/guards/*.go.
- Backfill CHANGELOG entries for v0.4.5 (Go module rename) and v0.4.6
  (auto-tag + auto-release-pr + audit-skip-file). v0.5.0 entry now points
  at the DFXswiss release URL and references test.yaml (not test.yml).
- ONBOARDING simulator example and ts/src/index.ts JSDoc now reference
  DFXswiss/bitbox-testkit consistently (ts/package.json was already at
  @DFXswiss).

* feat(release): conventional-commits-aware auto-tag

Replaces the hardcoded PATCH+1 logic in .github/workflows/auto-tag.yaml
with a small testable Go tool at go/cmd/release-version. The tool reads
every commit subject + body between the last release tag and HEAD,
parses them as Conventional Commits 1.0, and picks the highest bump:

  feat! / <type>! / BREAKING CHANGE: footer  -> MAJOR
  feat:                                       -> MINOR
  fix:, perf:, refactor:, revert:             -> PATCH
  chore:, ci:, docs:, test:, style:, build:   -> PATCH
  non-conventional subjects                   -> PATCH + warning

A single feat! anywhere in the range promotes the whole release to a
major bump; a single feat: promotes to minor. The aggregator is
pure: 31 table-driven tests in main_test.go lock every classification
arm + the SemVer math + the report shape consumers parse.

The auto-tag workflow now surfaces the per-commit breakdown as a CI
group so reviewers can see exactly which commit voted which way, and
short-circuits cleanly (exit code 4) when the range is empty.

CONTRIBUTING.md "Releases" rewritten with the new policy: a
commit-message -> bump table, the local preview command, and the
manual-release escape hatch for hotfixes.

Practical effect for v0.5.0: the feat(simulator): commit in this PR
will cause the auto-tagger to emit v0.5.0 (not v0.4.7) when the
develop -> main release PR merges, with no manual tag intervention.

* maintainer-edit: fix broken CHANGELOG links + atomic dual-tag push

CHANGELOG had 13 release links pointing at
github.com/joshuakrueger-dfx/bitbox-testkit/releases/tag/vX.Y.Z, but
that account no longer hosts releases — every linked page 404s. The
v0.4.5 entry also pointed at DFXswiss for a release that doesn't
exist yet. All historical release links now point at
DFXswiss/bitbox-testkit consistently; the actual GitHub-Release
backfill for v0.3.2 → v0.4.5 is a separate maintenance task and
doesn't gate the v0.5.0 cut.

auto-tag.yaml now uses `git push --atomic` for the vX.Y.Z + go/vX.Y.Z
pair. Without it, a partial push (server-side ref protection trip,
network blip on the second ref) could leave the repo with one tag
present and the other missing — and the next auto-tag run would fail
the "tag exists" check while consumers' `go install ...@vX.Y.Z` would
still 404 on the missing submodule tag. The --atomic flag tells the
server to apply both updates as a single transaction or neither.

---------

Co-authored-by: TaprootFreak <142087526+TaprootFreak@users.noreply.github.com>
---
 .github/actions/bitbox-audit/action.yml       |   4 +-
 .github/actions/bitbox-simulator/action.yml   |  13 +-
 .../workflow-templates/bitbox-audit-slash.yml |   4 +-
 .github/workflow-templates/bitbox-audit.yml   |   2 +-
 .../bitbox-simulator-slash.yml                |  14 +-
 .../workflow-templates/bitbox-simulator.yml   |   2 +-
 .github/workflows/auto-tag.yaml               | 109 ++++--
 .github/workflows/test.yaml                   |  48 +++
 CHANGELOG.md                                  | 136 ++++++++
 CONTRIBUTING.md                               |  42 ++-
 ONBOARDING.md                                 |  64 ++++
 go/bitbox/simulator/connect.go                | 123 +++++++
 go/bitbox/simulator/integration_test.go       |  73 +++-
 go/bitbox/simulator/scenarios.go              | 262 ++++++++++++++-
 go/bitbox/simulator/simulator.go              |  56 +++-
 go/cmd/bitbox-simulator-check/main.go         | 171 +++++-----
 go/cmd/release-version/main.go                | 317 ++++++++++++++++++
 go/cmd/release-version/main_test.go           | 238 +++++++++++++
 ts/src/fake/index.ts                          |  27 +-
 ts/src/index.ts                               |  10 +-
 ts/test/fake.test.ts                          |  24 ++
 ts/test/quirks.test.ts                        |   6 +-
 22 files changed, 1602 insertions(+), 143 deletions(-)
 create mode 100644 go/bitbox/simulator/connect.go
 create mode 100644 go/cmd/release-version/main.go
 create mode 100644 go/cmd/release-version/main_test.go

diff --git a/.github/actions/bitbox-audit/action.yml b/.github/actions/bitbox-audit/action.yml
index faee63d..7c4f87e 100644
--- a/.github/actions/bitbox-audit/action.yml
+++ b/.github/actions/bitbox-audit/action.yml
@@ -10,13 +10,13 @@ inputs:
   testkit-ref:
     description: >
       Git ref of DFXswiss/bitbox-testkit to install. Pin to a
-      tag (v0.2.0) for reproducibility; use 'main' during testkit
+      tag (v0.5.0) for reproducibility; use 'main' during testkit
       development to track the bleeding edge. The sentinel value
       'local' is reserved for the testkit's own self-test workflow
       and builds the CLI from the checked-out source — consumers
       should never set it.
     required: false
-    default: v0.2.0
+    default: v0.5.0
   firmware:
     description: >
       Optional firmware version (e.g. 9.23.0) to narrow the quirk set to
diff --git a/.github/actions/bitbox-simulator/action.yml b/.github/actions/bitbox-simulator/action.yml
index c099627..75f1f47 100644
--- a/.github/actions/bitbox-simulator/action.yml
+++ b/.github/actions/bitbox-simulator/action.yml
@@ -16,7 +16,7 @@ inputs:
       controls the action.yml itself. Mismatched refs leave the CLI
       running with stale embedded SHA pins — keep these in sync.
     required: false
-    default: v0.4.2
+    default: v0.5.0
   comment-on-pr:
     description: >
       'true' to post the markdown report as a sticky PR comment.
@@ -37,6 +37,15 @@ inputs:
       red check just because the simulator can't run there.
     required: false
     default: 'false'
+  firmware:
+    description: >
+      Specific embedded firmware name (e.g.
+      bitbox02-multi-v9.21.0-simulator1.0.0-linux-amd64) or 'all' to
+      run the matrix over every embedded firmware. Default '' = newest.
+      Matrix mode catches regressions that only surface on older
+      firmwares still in the production tail.
+    required: false
+    default: ''
 
 outputs:
   report-path:
@@ -74,11 +83,13 @@ runs:
       env:
         FAIL_ON_FINDINGS: ${{ inputs.fail-on-findings }}
         FAIL_ON_SKIP: ${{ inputs.fail-on-skip }}
+        FIRMWARE: ${{ inputs.firmware }}
       run: |
         set -o pipefail
         mkdir -p .bitbox-simulator
         flags=()
         if [ "$FAIL_ON_SKIP" = "true" ]; then flags+=(--fail-on-skip); fi
+        if [ -n "$FIRMWARE" ]; then flags+=(--firmware "$FIRMWARE"); fi
 
         # JSON first so the markdown render can reuse the SAME run.
         rc=0
diff --git a/.github/workflow-templates/bitbox-audit-slash.yml b/.github/workflow-templates/bitbox-audit-slash.yml
index caf3be7..89dd69e 100644
--- a/.github/workflow-templates/bitbox-audit-slash.yml
+++ b/.github/workflow-templates/bitbox-audit-slash.yml
@@ -68,7 +68,7 @@ jobs:
             const args = body.replace(/^\/bitbox-audit\s*/, '').split(/\s+/).filter(Boolean);
             let firmware = '';
             let failOnFindings = 'false';
-            let testkitRef = 'v0.2.0';
+            let testkitRef = 'v0.5.0';
             for (const tok of args) {
               if (tok === 'fail') { failOnFindings = 'true'; continue; }
               const [k, v] = tok.split('=');
@@ -115,7 +115,7 @@ jobs:
           TESTKIT_REF: ${{ needs.guard.outputs.testkit-ref }}
           FIRMWARE: ${{ needs.guard.outputs.firmware }}
 
-      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-audit@v0.2.0
+      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-audit@v0.5.0
         with:
           testkit-ref: ${{ needs.guard.outputs.testkit-ref }}
           firmware: ${{ needs.guard.outputs.firmware }}
diff --git a/.github/workflow-templates/bitbox-audit.yml b/.github/workflow-templates/bitbox-audit.yml
index b50df34..498d730 100644
--- a/.github/workflow-templates/bitbox-audit.yml
+++ b/.github/workflow-templates/bitbox-audit.yml
@@ -34,4 +34,4 @@ jobs:
     timeout-minutes: 15
     steps:
       - uses: actions/checkout@v4
-      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-audit@v0.2.0
+      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-audit@v0.5.0
diff --git a/.github/workflow-templates/bitbox-simulator-slash.yml b/.github/workflow-templates/bitbox-simulator-slash.yml
index a5a47a4..bc5d098 100644
--- a/.github/workflow-templates/bitbox-simulator-slash.yml
+++ b/.github/workflow-templates/bitbox-simulator-slash.yml
@@ -32,6 +32,8 @@ jobs:
       authorized: ${{ steps.authz.outputs.authorized }}
       sha: ${{ steps.parse.outputs.sha }}
       testkit-ref: ${{ steps.parse.outputs.testkit_ref }}
+      firmware: ${{ steps.parse.outputs.firmware }}
+      fail-on-findings: ${{ steps.parse.outputs.fail_on_findings }}
     steps:
       - name: Authorization
         id: authz
@@ -60,10 +62,14 @@ jobs:
           script: |
             const body = (context.payload.comment.body || '').trim();
             const args = body.replace(/^\/bitbox-simulator\s*/, '').split(/\s+/).filter(Boolean);
-            let testkitRef = 'v0.3.0';
+            let testkitRef = 'v0.5.0';
+            let firmware = '';
+            let failOnFindings = 'true';
             for (const tok of args) {
+              if (tok === 'fail') { failOnFindings = 'true'; continue; }
               const [k, v] = tok.split('=');
               if (k === 'ref' && v) testkitRef = v;
+              if (k === 'firmware' && v) firmware = v;
             }
             const { data: pr } = await github.rest.pulls.get({
               ...context.repo,
@@ -71,6 +77,8 @@ jobs:
             });
             core.setOutput('sha', pr.head.sha);
             core.setOutput('testkit_ref', testkitRef);
+            core.setOutput('firmware', firmware);
+            core.setOutput('fail_on_findings', failOnFindings);
             await github.rest.reactions.createForIssueComment({
               ...context.repo,
               comment_id: context.payload.comment.id,
@@ -86,6 +94,8 @@ jobs:
       - uses: actions/checkout@v4
         with:
           ref: ${{ needs.guard.outputs.sha }}
-      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-simulator@v0.3.0
+      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-simulator@v0.5.0
         with:
           testkit-ref: ${{ needs.guard.outputs.testkit-ref }}
+          firmware: ${{ needs.guard.outputs.firmware }}
+          fail-on-findings: ${{ needs.guard.outputs.fail-on-findings }}
diff --git a/.github/workflow-templates/bitbox-simulator.yml b/.github/workflow-templates/bitbox-simulator.yml
index 43aa3a7..b68a858 100644
--- a/.github/workflow-templates/bitbox-simulator.yml
+++ b/.github/workflow-templates/bitbox-simulator.yml
@@ -33,4 +33,4 @@ jobs:
     timeout-minutes: 15
     steps:
       - uses: actions/checkout@v4
-      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-simulator@v0.3.0
+      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-simulator@v0.5.0
diff --git a/.github/workflows/auto-tag.yaml b/.github/workflows/auto-tag.yaml
index c0912c8..07da6fb 100644
--- a/.github/workflows/auto-tag.yaml
+++ b/.github/workflows/auto-tag.yaml
@@ -1,5 +1,18 @@
 name: Auto Tag on Merge
 
+# Conventional-Commits-aware release tagger. On every push to main:
+#   1. Find the latest vX.Y.Z tag.
+#   2. Run go/cmd/release-version against the latest-tag..HEAD range —
+#      it parses each commit subject + body as Conventional Commits and
+#      picks the highest bump (feat! / BREAKING CHANGE → major,
+#      feat: → minor, fix/perf/refactor/chore/ci/... → patch).
+#   3. Create the dual tags vX.Y.Z + go/vX.Y.Z (Go submodule resolver
+#      needs both at the same commit, see CONTRIBUTING.md Releases).
+#   4. Create the matching GitHub Release with auto-generated notes.
+#
+# Exit code 4 from the version tool means "no commits in the range"
+# and short-circuits the rest of the job — that prevents an empty
+# re-tag if a maintainer pushes the same commit twice.
 on:
   push:
     branches: [main]
@@ -22,48 +35,87 @@ jobs:
           fetch-depth: 0
           fetch-tags: true
 
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.24'
+          cache: true
+          cache-dependency-path: go/go.sum
+
       - name: Get latest tag
         id: get-tag
         run: |
           LATEST_TAG=$(git tag -l --sort=-v:refname | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+$' | head -n 1)
 
           if [ -z "$LATEST_TAG" ]; then
-            echo "No existing tags found, starting with v0.0.0"
-            LATEST_TAG="v0.0.0"
+            echo "::notice::No existing release tag found, this run will emit the initial release"
+            LATEST_TAG=""
+          else
+            echo "::notice::Latest tag: $LATEST_TAG"
           fi
 
-          echo "latest_tag=$LATEST_TAG" >> $GITHUB_OUTPUT
-          echo "::notice::Latest tag: $LATEST_TAG"
+          echo "latest_tag=$LATEST_TAG" >> "$GITHUB_OUTPUT"
 
-      - name: Calculate next version
+      - name: Decide next version (Conventional Commits)
         id: next-version
+        working-directory: go
+        env:
+          LATEST_TAG: ${{ steps.get-tag.outputs.latest_tag }}
         run: |
-          LATEST_TAG="${{ steps.get-tag.outputs.latest_tag }}"
+          set +e
+          NEW_TAG=$(go run ./cmd/release-version --base "$LATEST_TAG" --report 2>release-version.stderr)
+          rc=$?
+          set -e
+
+          # Surface the per-commit report into the job log regardless
+          # of whether the tool succeeded — useful when a review needs
+          # to see which commits drove the bump.
+          if [ -s release-version.stderr ]; then
+            echo "::group::release-version warnings"
+            cat release-version.stderr
+            echo "::endgroup::"
+          fi
 
-          VERSION="${LATEST_TAG#v}"
-          MAJOR=$(echo "$VERSION" | cut -d. -f1)
-          MINOR=$(echo "$VERSION" | cut -d. -f2)
-          PATCH=$(echo "$VERSION" | cut -d. -f3)
+          # The tool prints the next version on the first stdout line
+          # followed by a blank line and the report. Extract just the
+          # tag for downstream steps; emit the rest as a group.
+          FIRST_LINE=$(printf '%s' "$NEW_TAG" | head -n 1)
+          REST=$(printf '%s' "$NEW_TAG" | tail -n +2)
+          if [ -n "$REST" ]; then
+            echo "::group::release-version per-commit breakdown"
+            printf '%s\n' "$REST"
+            echo "::endgroup::"
+          fi
 
-          NEW_PATCH=$((PATCH + 1))
-          NEW_TAG="v${MAJOR}.${MINOR}.${NEW_PATCH}"
+          if [ "$rc" -eq 4 ]; then
+            echo "::notice::No commits since $LATEST_TAG — skipping release"
+            echo "skip=true" >> "$GITHUB_OUTPUT"
+            exit 0
+          fi
+          if [ "$rc" -ne 0 ]; then
+            echo "::error::release-version exited $rc"
+            exit "$rc"
+          fi
 
-          echo "new_tag=$NEW_TAG" >> $GITHUB_OUTPUT
-          echo "::notice::New tag: $NEW_TAG"
+          echo "::notice::Next tag: $FIRST_LINE (was: ${LATEST_TAG:-<none>})"
+          echo "new_tag=$FIRST_LINE" >> "$GITHUB_OUTPUT"
+          echo "skip=false" >> "$GITHUB_OUTPUT"
 
       - name: Check if tag exists
         id: check-tag
+        if: steps.next-version.outputs.skip != 'true'
         run: |
           NEW_TAG="${{ steps.next-version.outputs.new_tag }}"
           if git rev-parse "$NEW_TAG" >/dev/null 2>&1; then
-            echo "::error::Tag $NEW_TAG already exists!"
-            echo "exists=true" >> $GITHUB_OUTPUT
-          else
-            echo "exists=false" >> $GITHUB_OUTPUT
+            echo "::error::Tag $NEW_TAG already exists — refusing to overwrite. \
+            Push an empty fixup commit to bump again, or delete the existing tag if it was misapplied."
+            echo "exists=true" >> "$GITHUB_OUTPUT"
+            exit 1
           fi
+          echo "exists=false" >> "$GITHUB_OUTPUT"
 
       - name: Create tag and release
-        if: steps.check-tag.outputs.exists == 'false'
+        if: steps.next-version.outputs.skip != 'true' && steps.check-tag.outputs.exists == 'false'
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
           NEW_TAG: ${{ steps.next-version.outputs.new_tag }}
@@ -72,15 +124,22 @@ jobs:
           git config user.name "github-actions[bot]"
           git config user.email "github-actions[bot]@users.noreply.github.com"
 
-          # Go submodule convention: the module at /go/ needs a `go/vX.Y.Z`
-          # tag on the same commit, otherwise `go install` cannot resolve
-          # the package — see CONTRIBUTING.md "Releases".
+          # Dual-tag for the Go submodule pattern. Without `go/vX.Y.Z`,
+          # `go install github.com/DFXswiss/bitbox-testkit/go/cmd/...@vX.Y.Z`
+          # fails because Go's module resolver looks for the path-prefixed
+          # tag when the module lives in a subdirectory.
           GO_TAG="go/${NEW_TAG}"
           git tag -a "$NEW_TAG" -m "Release $NEW_TAG"
           git tag -a "$GO_TAG" -m "$GO_TAG: submodule tag matching $NEW_TAG"
-          git push origin "$NEW_TAG" "$GO_TAG"
-
-          if [ "$PREV_TAG" = "v0.0.0" ]; then
+          # --atomic so the server applies both ref updates as a single
+          # transaction. Without it a partial push could leave the repo
+          # with vX.Y.Z but no go/vX.Y.Z (or vice versa), and the next
+          # auto-tag run would fail the "tag exists" check while the
+          # consumer-facing `go install` would still 404 on the missing
+          # submodule tag.
+          git push --atomic origin "$NEW_TAG" "$GO_TAG"
+
+          if [ -z "$PREV_TAG" ]; then
             gh release create "$NEW_TAG" --title "$NEW_TAG" --generate-notes
           else
             gh release create "$NEW_TAG" --title "$NEW_TAG" --generate-notes --notes-start-tag "$PREV_TAG"
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
index d1227f7..f4c2efe 100644
--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
@@ -61,6 +61,54 @@ jobs:
           mkdir -p "$WALLET_TESTKIT_SIMCACHE"
           go test -tags simulator -timeout 5m ./bitbox/simulator/...
 
+  go-simulator-matrix:
+    # Runs the full baseline against EVERY embedded firmware version
+    # (v9.19.0 → v9.26.1). Catches regressions where a new scenario
+    # works against the newest firmware but breaks against an older
+    # one a real user may still be running — important because the
+    # BitBox02 only auto-updates when the user opens the BitBoxApp,
+    # so production has a long tail of older firmwares in the wild.
+    name: simulator matrix · ${{ matrix.firmware }}
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: go
+    strategy:
+      fail-fast: false
+      matrix:
+        firmware:
+          - bitbox02-multi-v9.26.1-simulator1.0.0-linux-amd64
+          - bitbox02-multi-v9.25.0-simulator1.0.0-linux-amd64
+          - bitbox02-multi-v9.24.0-simulator1.0.0-linux-amd64
+          - bitbox02-multi-v9.23.0-simulator1.0.0-linux-amd64
+          - bitbox02-multi-v9.22.0-simulator1.0.0-linux-amd64
+          - bitbox02-multi-v9.21.0-simulator1.0.0-linux-amd64
+          - bitbox02-multi-v9.20.0-simulator1.0.0-linux-amd64
+          - bitbox02-multi-v9.19.0-simulator1.0.0-linux-amd64
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-go@v5
+        with:
+          go-version: '1.24'
+          cache: true
+          cache-dependency-path: go/go.sum
+      - name: cache simulator binaries
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/bitbox-testkit-simulators
+          key: bitbox-simulators-${{ hashFiles('go/bitbox/simulator/embedded.go') }}
+          restore-keys: |
+            bitbox-simulators-
+      - name: run scenarios against ${{ matrix.firmware }}
+        env:
+          WALLET_TESTKIT_SIMCACHE: ${{ github.workspace }}/.simcache
+        run: |
+          mkdir -p "$WALLET_TESTKIT_SIMCACHE"
+          go run -tags simulator ./cmd/bitbox-simulator-check \
+            --firmware "${{ matrix.firmware }}" \
+            --cache "$WALLET_TESTKIT_SIMCACHE" \
+            --format markdown
+
   ts-unit:
     name: TypeScript unit tests
     runs-on: ubuntu-latest
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5d6badf..6dd3408 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,142 @@
 
 All notable changes to bitbox-testkit. The project uses semantic versioning starting from v0.1.0.
 
+## v0.5.0 — 2026-05-17
+
+The "world-class" release: the baseline grows from 9 to 14 scenarios, covers Bitcoin alongside Ethereum, asserts a deterministic identity contract for the simulator seed, and now ships a multi-firmware matrix runner. The testkit self-audit is now green on its own source tree.
+
+### Added
+
+- **5 new simulator scenarios** in `go/bitbox/simulator/scenarios.go`:
+  - `RootFingerprintDeterministic` — pins the simulator's BIP-32 root fingerprint to `0x4c00739d`. If this fails, every other pinned-output assertion downstream is suspect, so consumers see one canonical "seed drifted" red signal instead of N derived symptoms.
+  - `EthSignLegacyPolygonMultiByteV` — actually drives `ETHSign(chainId=137)` to exercise the EIP-155 multi-byte v path (quirk CC-5). The pre-existing `EthAddressPolygonMultiByteV` only queried an address, which is identical regardless of chain id and therefore never tested the v-byte boundary.
+  - `BtcXpubZpubMainnet` — BIP-84 native-segwit ZPUB shape (zpub-prefix + base58 length envelope).
+  - `BtcAddressP2WPKHMainnet` — bech32 `bc1q` receive address at `m/84'/0'/0'/0/0`.
+  - `BtcAddressP2TRTaproot` — bech32m `bc1p` Taproot address at `m/86'/0'/0'/0/0` (distinct firmware codepath from P2WPKH).
+  - `BtcSignMessageMainnet` — Bitcoin signed-message envelope (64-byte R||S, recId 0..3, 65-byte Electrum sig with header byte 31..34).
+- **`simulator.Connect` helper** factors the Noise XX bring-up + channel-hash-verify wait out of the CLI so the integration test, `cmd/bitbox-simulator-check`, and any future consumer share one canonical implementation. Tunable via `ConnectOptions{HandshakeTimeout, Logger}`.
+- **`simulator.LaunchVersion(cacheDir, name)` + `ErrSimulatorNotFound`** lets a caller pin a specific embedded binary instead of always getting the newest one.
+- **`bitbox-simulator-check --firmware <name|all>`** flag: matrix-runs the full baseline against every embedded firmware (v9.19.0 → v9.26.1) and emits a `MatrixReport` with a per-firmware pass/fail table. Single-firmware runs remain shape-compatible (`MatrixReport.Reports[0]` is the legacy `Report`).
+- **CI: `go-simulator-matrix` job** in the testkit's own `test.yaml` runs the baseline against each of the 8 embedded firmwares in parallel on every push, catching regressions that only surface on the long tail of older firmwares users still have in production.
+- **CI: `TestSimulatorBaselineScenarios`** integration test executes every scenario against the real firmware on every push, replacing the previous "Launch only" smoke check. Surfaces scenario regressions at testkit-CI time instead of consumer time.
+
+### Changed
+
+- `cmd/bitbox-simulator-check`: report wire format is now `MatrixReport { Reports: [...] }` even for single-firmware runs. The CLI's exit-code semantics are unchanged (max of per-firmware exit codes).
+- TS `FakePairedBitBox`: proxy now ignores symbol-keyed lookups and returns `undefined` for `then` / `catch` / `finally`, so awaiting the proxy no longer infects the awaiter chain as if the proxy were thenable. Plus new `clearCalls()` for tests that want to reset the recorded call log mid-flight without releasing the fake.
+
+[v0.5.0]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.5.0
+
+## v0.4.6 — 2026-05-16
+
+CI and release plumbing aligned with the bitbox_flutter / DFXswiss org template — `develop` is now the default branch, push-to-develop opens an auto-release-pr (develop → main), merging that triggers an auto-tag that patch-bumps `vX.Y.Z` AND `go/vX.Y.Z` (Go submodule resolver needs both at the same commit).
+
+`composite action self-test` was failing on the umlaut payload inside `go/bitbox/simulator/scenarios.go`; suppressed via `audit-skip-file` (the umlauts are intentional test fixtures for quirk E1). `quirks.test.ts` switched from a hardcoded `30` count to a self-deriving assertion against the imported `quirks.json` so the test never goes stale when a quirk is added.
+
+[v0.4.6]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.4.6
+
+## v0.4.5 — 2026-05-16
+
+Go module path renamed from `github.com/joshuakrueger-dfx/bitbox-testkit` to `github.com/DFXswiss/bitbox-testkit` — the canonical home is now the org-owned mirror. Consumers must update their `go install` / `go.mod` references; composite-action references should also be re-pointed from the personal account to `DFXswiss/`.
+
+[v0.4.5]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.4.5
+
+## v0.4.4 — 2026-05-16
+
+`audit-skip-line` marker is now actually implemented in the audit-runner's scan loop. Before this it was documented for four releases (v0.4.0–v0.4.3) but the detector silently ignored the marker, so doc comments demonstrating an anti-pattern got flagged as real code. Particularly affected dfx-wallet's `bitbox.ts:605` and `types.ts:62`.
+
+### Added
+
+- `cmd/bitbox-audit/detect.go`: `isLineSuppressed` honours `audit-skip-line` on the offending source line OR the line directly above (matches the natural "comment on the line above" pattern). Three new tests in `audit_test.go` lock the behaviour.
+
+[v0.4.4]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.4.4
+
+## v0.4.3 — 2026-05-16
+
+Action default bump only — composite `bitbox-simulator` action now defaults `testkit-ref` to v0.4.2, so consumers who pin the action ref without overriding the CLI version pick up the umlaut-reject scenario automatically.
+
+[v0.4.3]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.4.3
+
+## v0.4.2 — 2026-05-16
+
+### Added
+
+- `EthSignTypedDataNonAsciiRejected` scenario: feeds the 13-field RealUnitUser EIP-712 payload with German umlauts (ü, ß) and asserts the BitBox firmware REJECTS with ErrInvalidInput101. Pins the quirk-E1 firmware contract — a future firmware that silently started accepting non-ASCII would make consumer-side `toBitboxSafeAscii` transliteration load-bearing on one firmware version and dead code on the next. Failing-as-expected here is the GREEN state.
+
+[v0.4.2]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.4.2
+
+## v0.4.1 — 2026-05-16
+
+Action default bump only — `testkit-ref` default → v0.4.0.
+
+[v0.4.1]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.4.1
+
+## v0.4.0 — 2026-05-16
+
+### Added
+
+- `EthSignTypedDataKycMultiPage` scenario: signs the exact 13-field RealUnitUser EIP-712 typed-data realunit-app's KYC onboarding uses. On a physical BitBox each string field renders as its own confirmation page ("1/13" → "13/13"); the simulator auto-confirms each page but the firmware still walks the full multi-page state machine. Guards the BLE-Dedup-Bug code path (1/13 → 2/13 transition, fixed 2026-05-14) and the antiklepto host-nonce-commitment exchange.
+
+[v0.4.0]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.4.0
+
+## v0.3.9 — 2026-05-16
+
+### Fixed
+
+- Composite `bitbox-simulator` action: sticky-comment step is now gated on `comment-on-pr && pull_request && head.repo.full_name == github.repository`, so fork PRs (which never get a write-scope token) no longer fail the whole job on the comment step. `continue-on-error: true` belt-and-braces.
+
+[v0.3.9]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.9
+
+## v0.3.8 — 2026-05-16
+
+### Fixed
+
+- `bitbox-simulator-check`: EIP-1559 scenario now uses a realistic payload (≈0.53 ETH at 6 gwei to a real-looking recipient) instead of zero-everything. The firmware refuses zero-recipient + zero-value + zero-gas combinations as obviously-malformed; the previous payload masked itself as a "firmware bug" when in fact every value being zero is the bug.
+
+[v0.3.8]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.8
+
+## v0.3.7 — 2026-05-16
+
+### Fixed
+
+- `bitbox-simulator-check`: switched simulator bring-up from `SetPassword(32)` to `RestoreFromMnemonic()` (upstream test pattern). `SetPassword` puts the device into a "showing newly-generated mnemonic" state, after which every ETH endpoint rejected calls with "can't call this endpoint: wrong state".
+
+[v0.3.7]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.7
+
+## v0.3.6 — 2026-05-16
+
+Action default bump → v0.3.5 (handshake fix).
+
+[v0.3.6]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.6
+
+## v0.3.5 — 2026-05-16
+
+### Fixed
+
+- `bitbox-simulator-check`: after `firmware.Device.Init()` we now poll `ChannelHash()` until the simulator reports `verified=true`, then call `ChannelHashVerify(true)`. Without this step every post-pair API call fails with "handshake must come first" — `Init` alone is not sufficient for the simulator firmware to unlock its endpoint surface.
+
+[v0.3.5]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.5
+
+## v0.3.4 — 2026-05-16
+
+Action default bump.
+
+[v0.3.4]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.4
+
+## v0.3.3 — 2026-05-16
+
+### Fixed
+
+- `go/core/simulator/simulator.go`: hash-mismatch error now surfaces the expected vs actual SHA-256 in the error message. Previously the error said only "hash mismatch", giving no signal whether upstream had reproducibly rebuilt the artefact or whether a real supply-chain alarm was firing.
+
+[v0.3.3]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.3
+
+## v0.3.2 — 2026-05-16
+
+Re-release of v0.3.1 from a fresh commit. Go's module proxy caches by version+commit-hash, so force-retagging the original v0.3.1 commit after a botched embedded.go update did not propagate to consumers via `go install`. v0.3.2 is the consumer-visible "v0.3.1 with the SHA pins actually correct".
+
+[v0.3.2]: https://github.com/DFXswiss/bitbox-testkit/releases/tag/v0.3.2
+
 ## v0.3.1 — 2026-05-16
 
 Patch: refresh SHA-256 pins for the three most-recent simulator binaries (v9.24.0, v9.25.0, v9.26.1) after Shift Crypto reproducibly rebuilt the upstream artefacts. Behaviour is unchanged — the rebuild only altered build-metadata (timestamps, paths). The five older versions (v9.19.0–v9.23.0) still match their original pins.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index a8427d3..454f158 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -83,15 +83,47 @@ A Scenario returns a configured fake suitable for use in a single test. Two ques
 
 ## Releases
 
-Tags follow `v0.MAJOR.MINOR` semver. The TypeScript package and the Go module both pick up the same tag — there's no separate cadence. CHANGELOG.md must be updated as part of the release commit.
+Tags follow semver `vMAJOR.MINOR.PATCH`. The TypeScript package and the Go module both pick up the same tag — there's no separate cadence. CHANGELOG.md is updated as part of the change PR (not as a separate "release commit").
 
-The Go module lives at `/go/`, so Go's submodule-tagging convention requires **two** tags pointing at the same commit: `vX.Y.Z` for the repo / composite-action ref, and `go/vX.Y.Z` for `go install` to resolve the package. Without the `go/` prefixed tag, consumers hit:
+### Automatic flow (the normal path)
+
+Releases happen automatically off `main`. Once a PR merges into `develop`, the `Auto Release PR` workflow opens a `Release: develop -> main` PR. When that PR is merged, the `Auto Tag on Merge` workflow runs, looks at every commit between the previous tag and the new `main` HEAD, parses each subject as Conventional Commits, picks the highest bump, and creates **both** tags (`vX.Y.Z` and `go/vX.Y.Z`) plus the matching GitHub Release.
+
+The Go module lives at `/go/`, so Go's submodule-tagging convention requires the dual tag at the same commit. Without the `go/` prefixed tag, consumers hit:
 
 > `module github.com/DFXswiss/bitbox-testkit@vX.Y.Z found, but does not contain package …/go/cmd/bitbox-audit`
 
+#### Commit message → bump table
+
+The auto-tagger reads Conventional Commits 1.0. Use these subjects in your PR commits:
+
+| Subject prefix                           | Bump      | Example                                            |
+| ---------------------------------------- | --------- | -------------------------------------------------- |
+| `feat!:`, `fix!:`, `<type>!:`            | **MAJOR** | `feat!: drop legacy bitbox-api v0.11 support`      |
+| `BREAKING CHANGE:` in commit body        | **MAJOR** | (paired with any subject)                          |
+| `feat:`, `feat(scope):`                  | **MINOR** | `feat(simulator): add BTC scenarios`               |
+| `fix:`, `perf:`, `refactor:`, `revert:`  | **PATCH** | `fix(audit): suppress doc-comment false positive`  |
+| `chore:`, `ci:`, `docs:`, `test:`, `style:`, `build:` | **PATCH** | `ci: cache go modules`              |
+| (anything else)                          | **PATCH** + warning | the auto-tagger logs a warning to the CI step |
+
+The aggregator picks the **highest** bump across every commit in the range — one `feat!:` is enough to promote the whole release to a major bump, one `feat:` is enough for a minor.
+
+#### Local preview
+
+Before merging a release PR you can preview the version the auto-tagger will pick:
+
 ```bash
-# Bump version in /ts/package.json
-# Update CHANGELOG.md
+go -C go run ./cmd/release-version --base "$(git describe --tags --abbrev=0 --match='v*.*.*')" --report
+```
+
+The first stdout line is the next tag; the rest is a per-commit explanation of why each commit voted the way it did.
+
+### Manual release (escape hatch)
+
+If you ever need to ship out-of-band — e.g. an emergency security fix from a hotfix branch — push the tags by hand:
+
+```bash
+# Update CHANGELOG.md, /ts/package.json version
 git commit -am "Release vX.Y.Z"
 
 # Two tags, one commit. The 'go/' prefix is required by Go's
@@ -101,3 +133,5 @@ git tag -a go/vX.Y.Z -m "go/vX.Y.Z: submodule tag matching vX.Y.Z" vX.Y.Z^{}
 
 git push origin main --tags
 ```
+
+When the auto-tagger next runs, it will see the manual tags in `git tag -l` and pick the bump relative to them — no special-case handling needed.
diff --git a/ONBOARDING.md b/ONBOARDING.md
index 678c5de..75011e9 100644
--- a/ONBOARDING.md
+++ b/ONBOARDING.md
@@ -155,6 +155,70 @@ The sticky comment shows:
 - **Coverage buckets**: which quirks are statically detected, which are covered by passing runtime tests, which have failing tests, which are untested.
 - **Untested quirks**: explicit list. Each one is a gap until you add a test for it.
 
+### 6 · End-to-end against the real firmware (`bitbox-simulator`)
+
+The audit catches anti-patterns in your source. The simulator action catches what only the actual BitBox firmware can tell you: does your wire format round-trip, does pairing complete, do multi-page typed-data signs hold, does the firmware accept the bytes you're about to ship to a user's device.
+
+`.github/workflows/bitbox-simulator.yml`:
+
+```yaml
+name: bitbox-simulator
+on:
+  pull_request:
+    paths:
+      - 'src/**'
+      - 'test/**'
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  pull-requests: write
+
+jobs:
+  simulator:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: DFXswiss/bitbox-testkit/.github/actions/bitbox-simulator@v0.5.0
+        with:
+          testkit-ref: v0.5.0
+          # Optional:
+          #   firmware: bitbox02-multi-v9.21.0-simulator1.0.0-linux-amd64
+          #   firmware: all      # matrix over every embedded firmware
+```
+
+What runs: the action downloads the SHA-pinned upstream BitBox02 simulator binary, brings it up via `Init → ChannelHash → ChannelHashVerify`, restores the deterministic fixture seed (root fingerprint `4c00739d`), then runs the **14 baseline scenarios** in order:
+
+1. `pair_and_device_info` — Noise XX + DeviceInfo
+2. `restore_simulator_mnemonic` — deterministic seed
+3. `root_fingerprint_deterministic` — pins `4c00739d`
+4. `eth_address_mainnet` — chainId=1 BIP-44
+5. `eth_address_polygon_multibyte_v` — chainId=137 address
+6. `eth_sign_message_ascii` — short personal sign
+7. `eth_sign_message_boundary_1024` — firmware-doc max
+8. `eth_sign_legacy_polygon_multibyte_v` — actual EIP-155 sign at chainId=137
+9. `eth_sign_eip1559_mainnet` — type-2 tx, realistic payload
+10. `eth_sign_typed_data_kyc_multipage` — 13-field EIP-712 multi-page (1/13 → 13/13)
+11. `eth_sign_typed_data_non_ascii_rejected` — quirk E1 firmware-reject contract
+12. `btc_xpub_zpub_mainnet` — BIP-84 ZPUB shape
+13. `btc_address_p2wpkh_mainnet` — bech32 `bc1q…`
+14. `btc_address_p2tr_taproot` — bech32m `bc1p…`
+15. `btc_sign_message_mainnet` — 64-byte sig + 65-byte Electrum envelope
+
+Total run on a GitHub-hosted Linux runner: ~400ms scenarios + ~3s setup. The simulator is **Linux/amd64 only** — on macOS / Windows / arm runners the action exits cleanly with a `skipped` status (use `fail-on-skip: true` to make non-Linux a hard fail).
+
+#### Matrix mode
+
+Set `firmware: all` to drive every embedded firmware version (v9.19.0 → v9.26.1) in parallel. Catches regressions that only surface on older firmwares still in the user wild — the BitBox02 only auto-updates when the user opens the BitBoxApp, so production has a long tail.
+
+#### Slash trigger
+
+Comment `/bitbox-simulator` on any PR. Modifiers: `firmware=v9.21.0`, `firmware=all`, `ref=Y`, `fail`. Auth gated on `author_association ∈ {OWNER, MEMBER, COLLABORATOR}`.
+
+#### What it doesn't catch
+
+The simulator validates the `bitbox-api` ↔ firmware protocol surface. It does **not** validate your consumer's USB-HID / BLE transport layer against real hardware — that still requires a physical BitBox02. The two checks are complementary: audit covers source patterns, simulator covers firmware contract, hardware-on-the-table covers transport.
+
 ---
 
 ## Test naming convention (important)
diff --git a/go/bitbox/simulator/connect.go b/go/bitbox/simulator/connect.go
new file mode 100644
index 0000000..e927866
--- /dev/null
+++ b/go/bitbox/simulator/connect.go
@@ -0,0 +1,123 @@
+// Package simulator helpers for bringing a launched simulator instance
+// up to a "ready for scenarios" firmware.Device.
+//
+// Extracted from cmd/bitbox-simulator-check so the integration test, the
+// CLI, and any future consumer share the exact same Noise XX + channel-
+// hash auto-acknowledgment dance. A subtle change here (e.g. raising the
+// wait deadline) MUST land in every consumer at once; a shared helper
+// makes that mechanical.
+
+package simulator
+
+import (
+	"errors"
+	"fmt"
+	"time"
+
+	"github.com/BitBoxSwiss/bitbox02-api-go/api/firmware"
+	"github.com/flynn/noise"
+)
+
+// ConnectOptions tunes the bring-up. Zero values are sensible defaults.
+type ConnectOptions struct {
+	// HandshakeTimeout caps the wait for the simulator firmware to mark
+	// the pairing channel as device-confirmed. The simulator auto-
+	// confirms within a few hundred ms; 5s gives generous CI headroom.
+	HandshakeTimeout time.Duration
+	// Logger lets a caller route firmware-library logs somewhere; nil
+	// uses a silent logger.
+	Logger firmware.Logger
+}
+
+// Connect drives the post-Launch bring-up: firmware.NewDevice → Init →
+// poll ChannelHash → ChannelHashVerify. Returns a Device that is ready
+// to accept every BaselineScenarios call.
+//
+// The simulator regenerates its app-keypair every run (no persistence),
+// so we wire an in-memory ConfigInterface.
+func Connect(inst *Instance, opts ConnectOptions) (*firmware.Device, error) {
+	if inst == nil {
+		return nil, errors.New("simulator.Connect: nil Instance")
+	}
+	if inst.Comm == nil {
+		return nil, errors.New("simulator.Connect: Instance has no Comm")
+	}
+	timeout := opts.HandshakeTimeout
+	if timeout <= 0 {
+		timeout = 5 * time.Second
+	}
+	logger := opts.Logger
+	if logger == nil {
+		logger = noopLogger{}
+	}
+
+	dev := firmware.NewDevice(
+		nil, // version: query from device via OP_INFO
+		nil, // product: same
+		&MemoryConfig{},
+		inst.Comm,
+		logger,
+	)
+	if err := dev.Init(); err != nil {
+		return nil, fmt.Errorf("firmware.Device.Init: %w", err)
+	}
+
+	// Wait for the simulator firmware to mark the pairing as device-
+	// confirmed (auto-confirms within ~ms). On a physical BitBox this
+	// would require the user to compare the channel hash and tap.
+	deadline := time.Now().Add(timeout)
+	for {
+		_, verified := dev.ChannelHash()
+		if verified {
+			dev.ChannelHashVerify(true)
+			return dev, nil
+		}
+		if time.Now().After(deadline) {
+			return nil, fmt.Errorf(
+				"firmware.Device: channel-hash never device-verified within %s — "+
+					"simulator should auto-confirm", timeout,
+			)
+		}
+		time.Sleep(100 * time.Millisecond)
+	}
+}
+
+// MemoryConfig is a minimal in-memory firmware.ConfigInterface. Suitable
+// for throw-away simulator runs where the noise keypair does not need
+// to survive the process.
+type MemoryConfig struct {
+	devicePubkeys [][]byte
+	appKey        *noise.DHKey
+}
+
+// ContainsDeviceStaticPubkey returns true if pubkey was previously added.
+func (c *MemoryConfig) ContainsDeviceStaticPubkey(pubkey []byte) bool {
+	for _, k := range c.devicePubkeys {
+		if string(k) == string(pubkey) {
+			return true
+		}
+	}
+	return false
+}
+
+// AddDeviceStaticPubkey records pubkey as trusted.
+func (c *MemoryConfig) AddDeviceStaticPubkey(pubkey []byte) error {
+	c.devicePubkeys = append(c.devicePubkeys, append([]byte(nil), pubkey...))
+	return nil
+}
+
+// GetAppNoiseStaticKeypair returns the persisted app keypair, or nil.
+func (c *MemoryConfig) GetAppNoiseStaticKeypair() *noise.DHKey { return c.appKey }
+
+// SetAppNoiseStaticKeypair persists the app keypair.
+func (c *MemoryConfig) SetAppNoiseStaticKeypair(key *noise.DHKey) error {
+	c.appKey = key
+	return nil
+}
+
+// noopLogger silences the firmware library's logging output.
+type noopLogger struct{}
+
+func (noopLogger) Error(string, error) {}
+func (noopLogger) Info(string)         {}
+func (noopLogger) Debug(string)        {}
diff --git a/go/bitbox/simulator/integration_test.go b/go/bitbox/simulator/integration_test.go
index ce96966..6a6324c 100644
--- a/go/bitbox/simulator/integration_test.go
+++ b/go/bitbox/simulator/integration_test.go
@@ -11,18 +11,27 @@ import (
 	"github.com/DFXswiss/bitbox-testkit/go/bitbox/simulator"
 )
 
-// TestSimulatorRoundtrip launches the newest known BitBox02 simulator and
-// verifies basic connectivity. Gated by the `simulator` build tag so
-// `go test ./...` on a developer machine never triggers a binary download.
-func TestSimulatorRoundtrip(t *testing.T) {
-	cacheDir := os.Getenv("WALLET_TESTKIT_SIMCACHE")
-	if cacheDir == "" {
-		cacheDir = filepath.Join(os.TempDir(), "bitbox-testkit-simcache")
+// simCacheDir returns the directory simulator binaries are cached in,
+// honouring WALLET_TESTKIT_SIMCACHE so CI can pin a single download
+// path across all simulator-tagged tests.
+func simCacheDir(t *testing.T) string {
+	t.Helper()
+	dir := os.Getenv("WALLET_TESTKIT_SIMCACHE")
+	if dir == "" {
+		dir = filepath.Join(os.TempDir(), "bitbox-testkit-simcache")
 	}
-	if err := os.MkdirAll(cacheDir, 0o755); err != nil {
+	if err := os.MkdirAll(dir, 0o755); err != nil {
 		t.Fatal(err)
 	}
-	inst, err := simulator.Launch(cacheDir)
+	return dir
+}
+
+// TestSimulatorRoundtrip launches the newest known BitBox02 simulator and
+// verifies basic connectivity. Cheaper than the full baseline so we
+// keep it as a fast smoke test that fails loudly if the binary cache or
+// TCP plumbing is broken.
+func TestSimulatorRoundtrip(t *testing.T) {
+	inst, err := simulator.Launch(simCacheDir(t))
 	if err != nil {
 		t.Fatalf("Launch: %v", err)
 	}
@@ -37,3 +46,49 @@ func TestSimulatorRoundtrip(t *testing.T) {
 	// Give the simulator a moment to be fully ready, then close cleanly.
 	time.Sleep(100 * time.Millisecond)
 }
+
+// TestSimulatorBaselineScenarios drives every scenario in BaselineScenarios
+// against the simulator firmware and fails the test for any non-pass.
+// This is the canonical "does the consumer-facing scenario set still
+// match the real firmware contract?" check — CI runs this on every
+// push so a firmware drift OR a scenario-logic regression in the
+// testkit itself surfaces as a single red signal.
+//
+// We do NOT use a sub-test per scenario: if any scenario fails, every
+// subsequent scenario likely fails too (the device may be in a bad
+// state after a failed RestoreFromMnemonic, for example). Reporting
+// the first failure plus the post-failure context is more useful than
+// burying it under N parallel red dots.
+func TestSimulatorBaselineScenarios(t *testing.T) {
+	inst, err := simulator.Launch(simCacheDir(t))
+	if err != nil {
+		t.Fatalf("Launch: %v", err)
+	}
+	t.Cleanup(inst.Stop)
+
+	dev, err := simulator.Connect(inst, simulator.ConnectOptions{})
+	if err != nil {
+		t.Fatalf("Connect: %v", err)
+	}
+
+	scenarios := simulator.BaselineScenarios()
+	if len(scenarios) == 0 {
+		t.Fatal("BaselineScenarios returned empty slice")
+	}
+
+	var firstFailure string
+	for _, sc := range scenarios {
+		res := sc(dev)
+		if res.Passed {
+			t.Logf("PASS %-44s (%dms)", res.Name, res.DurationMs)
+			continue
+		}
+		t.Errorf("FAIL %-44s (%dms) — %s", res.Name, res.DurationMs, res.Detail)
+		if firstFailure == "" {
+			firstFailure = res.Name
+		}
+	}
+	if firstFailure != "" {
+		t.Fatalf("first failing scenario: %s — see above for details", firstFailure)
+	}
+}
diff --git a/go/bitbox/simulator/scenarios.go b/go/bitbox/simulator/scenarios.go
index 5d05afb..874d826 100644
--- a/go/bitbox/simulator/scenarios.go
+++ b/go/bitbox/simulator/scenarios.go
@@ -7,6 +7,7 @@
 package simulator
 
 import (
+	"encoding/hex"
 	"errors"
 	"fmt"
 	"math/big"
@@ -96,25 +97,43 @@ const realUnitUserKycPayload = `{
 //
 // The simulator firmware accepts these calls without user interaction
 // because it auto-confirms every prompt; on a physical BitBox the
-// user would tap to confirm. The simulator is pre-loaded with a fixed
-// mnemonic ("boring mistake dish oyster truth pigeon viable emerge
-// sort crash wire portion cannon couple enact box walk height pull
-// today solid off enable tide") so address-derivation outputs are
-// deterministic.
+// user would tap to confirm. The simulator is pre-loaded with the
+// upstream fixture mnemonic so address-derivation outputs are
+// deterministic — the BIP-32 root fingerprint is 0x4c00739d for every
+// simulator session, which lets us pin exact xpubs / addresses across
+// runs (asserted by RootFingerprintDeterministic).
+//
+// Naming convention: where a scenario directly guards a known quirk
+// from quirks.json the function name and scenario id reference the
+// quirk id (E1, P2, CC-5 …) so a finding in CI maps unambiguously to
+// the documented anti-pattern.
 //
 // For each scenario we assert what the CONSUMER would see — error
 // class, byte shape, identity contract.
 func BaselineScenarios() []Scenario {
 	return []Scenario{
+		// Pairing + bring-up.
 		PairAndDeviceInfo,
 		RestoreSimulatorMnemonic,
+		RootFingerprintDeterministic,
+
+		// Ethereum address surface.
 		EthAddressMainnet,
 		EthAddressPolygonMultiByteV,
+
+		// Ethereum sign surface.
 		EthSignMessageAscii,
 		EthSignMessageBoundary,
+		EthSignLegacyPolygonMultiByteV,
 		EthSignEIP1559Mainnet,
 		EthSignTypedDataKycMultiPage,
 		EthSignTypedDataNonAsciiRejected,
+
+		// Bitcoin surface (BIP-84 native segwit + BIP-86 taproot).
+		BtcXpubZpubMainnet,
+		BtcAddressP2WPKHMainnet,
+		BtcAddressP2TRTaproot,
+		BtcSignMessageMainnet,
 	}
 }
 
@@ -354,6 +373,10 @@ func EthSignTypedDataKycMultiPage(dev *firmware.Device) Result {
 // (which would be a confusing partial-success path where the
 // consumer's transliteration becomes load-bearing for one firmware
 // version and dead code for the next).
+//
+// The literal non-ASCII bytes here are the actual test fixture — the
+// file-level audit-skip-file marker at the top of this file suppresses
+// the quirk-E1 self-audit hits these fixtures would otherwise produce.
 const realUnitUserKycPayloadWithUmlauts = `{
   "types": {
     "EIP712Domain": [
@@ -440,6 +463,235 @@ func EthSignTypedDataNonAsciiRejected(dev *firmware.Device) Result {
 	})
 }
 
+// simulatorRootFingerprint is the deterministic BIP-32 root fingerprint
+// the upstream BitBox02 simulator derives from its baked-in fixture
+// mnemonic. Captured upstream in firmware/secp256k1_test.go and
+// confirmed locally across simulator versions v9.19.0 – v9.26.1. A
+// mismatch here means EITHER upstream changed the fixture mnemonic
+// (which would invalidate every pinned-output assertion in this file)
+// OR our RestoreFromMnemonic step silently failed and the device is
+// running with a different seed.
+const simulatorRootFingerprint = "4c00739d"
+
+// RootFingerprintDeterministic asserts the BIP-32 root fingerprint
+// returned by the simulator matches the upstream fixture. This is the
+// load-bearing identity contract for every downstream scenario that
+// pins an exact address / xpub / signature byte — if this scenario
+// fails, treat every other pinned-output failure as a derived symptom.
+func RootFingerprintDeterministic(dev *firmware.Device) Result {
+	return run("root_fingerprint_deterministic", func() error {
+		fp, err := dev.RootFingerprint()
+		if err != nil {
+			return fmt.Errorf("RootFingerprint: %w", err)
+		}
+		got := hex.EncodeToString(fp)
+		if got != simulatorRootFingerprint {
+			return fmt.Errorf(
+				"simulator root fingerprint drift: expected %s, got %s — "+
+					"either RestoreFromMnemonic failed or upstream changed "+
+					"the fixture seed (re-confirm and update simulatorRootFingerprint)",
+				simulatorRootFingerprint, got,
+			)
+		}
+		return nil
+	})
+}
+
+// EthSignLegacyPolygonMultiByteV signs a legacy (pre-EIP-1559) Ethereum
+// transaction on Polygon (chainId=137). The existing
+// EthAddressPolygonMultiByteV only queries an address — addresses do
+// not depend on chainId, so that probe never actually exercises the
+// firmware's chain-id-in-v-byte path. THIS scenario does:
+//
+// EIP-155 encodes v = recId + 35 + 2 * chainId. For chainId=137 that
+// already exceeds 8 bits (35 + 2*137 = 309), forcing the firmware
+// and the consumer's RLP decoder to handle a multi-byte v. Quirk CC-5
+// (multi-byte v truncation) lives exactly here — a future firmware
+// regression that silently truncates v to one byte would fail this
+// scenario by returning a non-65-byte signature OR by returning a
+// v that, when summed with EIP-155 constants, doesn't round-trip.
+//
+// Note: the simulator's deprecated `coin` enum maps every chain id to
+// ETHCoin_ETH for firmware v9.10.0+ (which all our pinned versions
+// satisfy), so the simulator accepts chainId=137 directly.
+func EthSignLegacyPolygonMultiByteV(dev *firmware.Device) Result {
+	return run("eth_sign_legacy_polygon_multibyte_v", func() error {
+		recipient := [20]byte{
+			0x04, 0xf2, 0x64, 0xcf, 0x34, 0x44, 0x03, 0x13, 0xb4, 0xa0,
+			0x19, 0x2a, 0x35, 0x28, 0x14, 0xfb, 0xe9, 0x27, 0xb8, 0x85,
+		}
+		sig, err := dev.ETHSign(
+			137, // Polygon — 2*137+35=309, multi-byte v territory
+			[]uint32{44 + hardened, 60 + hardened, 0 + hardened, 0, 0},
+			0,                                                // nonce
+			new(big.Int).SetUint64(30_000_000_000),           // gasPrice (30 gwei)
+			21000,                                            // gasLimit
+			recipient,
+			new(big.Int).SetUint64(100_000_000_000_000_000), // 0.1 MATIC
+			nil,                                              // data
+			messages.ETHAddressCase_ETH_ADDRESS_CASE_MIXED,
+		)
+		if err != nil {
+			return fmt.Errorf("ETHSign(chainId=137): %w", err)
+		}
+		if len(sig) != 65 {
+			return fmt.Errorf("expected 65-byte sig, got %d", len(sig))
+		}
+		// For EIP-155 legacy sigs the SDK returns the raw 0/1 recId in
+		// the last byte; the consumer adds 35+2*chainId to produce the
+		// on-wire v. A returned byte outside {0,1} would indicate the
+		// firmware leaked an already-encoded v back through the SDK.
+		if sig[64] != 0x00 && sig[64] != 0x01 {
+			return fmt.Errorf("legacy ETH sign v byte must be 0 or 1 (raw recId), got 0x%02x", sig[64])
+		}
+		return nil
+	})
+}
+
+// btcMainnetCoin is the BIP-84 / BIP-86 mainnet coin enum reused
+// across the Bitcoin scenarios.
+const btcMainnetCoin = messages.BTCCoin_BTC
+
+// BtcXpubZpubMainnet derives a BIP-84 native-SegWit ZPUB at the
+// canonical account path m/84'/0'/0' and asserts it is well-formed
+// (zpub prefix, base58 length range). Because the simulator seed is
+// deterministic the value is stable across runs; we assert only the
+// shape here so a future BIP-32 library change in the simulator can
+// shift internal encoding without breaking the scenario.
+//
+// This probe also exercises the BTC pairing-state path on the firmware:
+// any consumer that requests BTC pubkeys directly after pairing (e.g.
+// dfx-wallet's planned BTC support) hits exactly this codepath.
+func BtcXpubZpubMainnet(dev *firmware.Device) Result {
+	return run("btc_xpub_zpub_mainnet", func() error {
+		xpub, err := dev.BTCXPub(
+			btcMainnetCoin,
+			// m/84'/0'/0' — BIP-84 account.
+			[]uint32{84 + hardened, 0 + hardened, 0 + hardened},
+			messages.BTCPubRequest_ZPUB,
+			false, // display=false (auto-confirm in simulator)
+		)
+		if err != nil {
+			return fmt.Errorf("BTCXPub(zpub): %w", err)
+		}
+		if !strings.HasPrefix(xpub, "zpub") {
+			return fmt.Errorf("expected zpub prefix, got %q", xpub)
+		}
+		// BIP-32 base58 extended keys are 111 chars long. Reject anything
+		// outside the canonical range — a shorter string means truncation,
+		// a longer one means embedded whitespace or BOM.
+		if len(xpub) < 108 || len(xpub) > 112 {
+			return fmt.Errorf("zpub length %d outside expected 108..112", len(xpub))
+		}
+		return nil
+	})
+}
+
+// BtcAddressP2WPKHMainnet derives a native-SegWit (bech32) receive
+// address at m/84'/0'/0'/0/0 and asserts the bc1q prefix + length
+// envelope (a P2WPKH address is exactly 42 chars: "bc1q" + 38 chars
+// of bech32 data).
+//
+// On a physical BitBox the user sees the bech32 string on the device
+// screen; the simulator auto-confirms. We do NOT pin the exact bech32
+// string because that would couple this scenario to the simulator's
+// specific seed library version — the prefix + length contract is the
+// stable surface.
+func BtcAddressP2WPKHMainnet(dev *firmware.Device) Result {
+	return run("btc_address_p2wpkh_mainnet", func() error {
+		addr, err := dev.BTCAddress(
+			btcMainnetCoin,
+			// m/84'/0'/0'/0/0 — BIP-84 first receive address.
+			[]uint32{84 + hardened, 0 + hardened, 0 + hardened, 0, 0},
+			firmware.NewBTCScriptConfigSimple(messages.BTCScriptConfig_P2WPKH),
+			false,
+		)
+		if err != nil {
+			return fmt.Errorf("BTCAddress(P2WPKH): %w", err)
+		}
+		if !strings.HasPrefix(addr, "bc1q") {
+			return fmt.Errorf("expected bc1q prefix for P2WPKH, got %q", addr)
+		}
+		// P2WPKH bech32 is 42 chars on mainnet ("bc1q" + 38 data).
+		if len(addr) != 42 {
+			return fmt.Errorf("expected 42-char P2WPKH bech32, got %d (%q)", len(addr), addr)
+		}
+		return nil
+	})
+}
+
+// BtcAddressP2TRTaproot derives a Taproot (BIP-86) address at
+// m/86'/0'/0'/0/0 and asserts the bc1p bech32m prefix. P2TR addresses
+// are 62 chars on mainnet ("bc1p" + 58 chars of bech32m).
+//
+// This exercises an entirely different firmware codepath than P2WPKH
+// (Taproot script construction is BIP-341 + key tweaking), so it
+// guards the broader BTC surface, not just a duplicate of the P2WPKH
+// probe.
+func BtcAddressP2TRTaproot(dev *firmware.Device) Result {
+	return run("btc_address_p2tr_taproot", func() error {
+		addr, err := dev.BTCAddress(
+			btcMainnetCoin,
+			// m/86'/0'/0'/0/0 — BIP-86 first receive address.
+			[]uint32{86 + hardened, 0 + hardened, 0 + hardened, 0, 0},
+			firmware.NewBTCScriptConfigSimple(messages.BTCScriptConfig_P2TR),
+			false,
+		)
+		if err != nil {
+			return fmt.Errorf("BTCAddress(P2TR): %w", err)
+		}
+		if !strings.HasPrefix(addr, "bc1p") {
+			return fmt.Errorf("expected bc1p prefix for P2TR, got %q", addr)
+		}
+		if len(addr) != 62 {
+			return fmt.Errorf("expected 62-char P2TR bech32m, got %d (%q)", len(addr), addr)
+		}
+		return nil
+	})
+}
+
+// BtcSignMessageMainnet signs a Bitcoin message under the BIP-322-ish
+// firmware path (BitBox02 signs the legacy "Bitcoin Signed Message"
+// preamble against the P2WPKH key) and asserts the returned envelope:
+//
+//   - sig 64 bytes (R||S)
+//   - recId 0..3
+//   - electrum 65-byte sig with header byte in {31, 32, 33, 34}
+//     (27 + 4 [compressed] + recId)
+//
+// This exercises the BTC sign codepath end-to-end (request → firmware
+// sign → antiklepto host-nonce exchange → response decode), which is
+// distinct from address derivation.
+func BtcSignMessageMainnet(dev *firmware.Device) Result {
+	return run("btc_sign_message_mainnet", func() error {
+		res, err := dev.BTCSignMessage(
+			btcMainnetCoin,
+			&messages.BTCScriptConfigWithKeypath{
+				ScriptConfig: firmware.NewBTCScriptConfigSimple(messages.BTCScriptConfig_P2WPKH),
+				Keypath:      []uint32{84 + hardened, 0 + hardened, 0 + hardened, 0, 0},
+			},
+			[]byte("hello bitbox"),
+		)
+		if err != nil {
+			return fmt.Errorf("BTCSignMessage: %w", err)
+		}
+		if len(res.Signature) != 64 {
+			return fmt.Errorf("expected 64-byte (R||S) sig, got %d", len(res.Signature))
+		}
+		if res.RecID > 3 {
+			return fmt.Errorf("recId must be 0..3, got %d", res.RecID)
+		}
+		if len(res.ElectrumSig65) != 65 {
+			return fmt.Errorf("expected 65-byte electrum sig, got %d", len(res.ElectrumSig65))
+		}
+		// Electrum header = 27 + 4 (compressed) + recId → must be 31..34.
+		if h := res.ElectrumSig65[0]; h < 31 || h > 34 {
+			return fmt.Errorf("electrum header byte must be 31..34, got %d", h)
+		}
+		return nil
+	})
+}
+
 // hardened adds the BIP-32 hardened-derivation flag bit. Inlined as a
 // const because the firmware API uses uint32 path elements directly.
 const hardened uint32 = 0x80000000
diff --git a/go/bitbox/simulator/simulator.go b/go/bitbox/simulator/simulator.go
index 2cb6b30..296ebfe 100644
--- a/go/bitbox/simulator/simulator.go
+++ b/go/bitbox/simulator/simulator.go
@@ -68,11 +68,26 @@ func (i *Instance) Stop() {
 // cacheDir is where downloaded binaries live; reuse it across tests to
 // avoid re-downloading.
 func Launch(cacheDir string) (*Instance, error) {
+	return LaunchVersion(cacheDir, "")
+}
+
+// LaunchVersion is Launch with an explicit binary version. Pass the
+// `Name` field of one of Simulators() (e.g. "bitbox02-multi-9.21.0")
+// or an empty string for the newest known build. The BITBOX_SIMULATOR
+// env override (absolute path to a binary on disk) takes precedence
+// over this argument — that lets a developer drop in a local debug
+// build without needing to extend the embedded list.
+//
+// Returns ErrSimulatorNotFound if the name does not match any
+// embedded entry, which is a deliberately distinct error from
+// ErrUnsupportedPlatform so the CLI's --firmware flag can give a
+// helpful "did you mean…" hint.
+func LaunchVersion(cacheDir, name string) (*Instance, error) {
 	if runtime.GOOS != "linux" || runtime.GOARCH != "amd64" {
 		return nil, ErrUnsupportedPlatform
 	}
 
-	path, err := resolveBinary(cacheDir)
+	path, err := resolveBinary(cacheDir, name)
 	if err != nil {
 		return nil, err
 	}
@@ -103,7 +118,11 @@ func Launch(cacheDir string) (*Instance, error) {
 	}, nil
 }
 
-func resolveBinary(cacheDir string) (string, error) {
+// ErrSimulatorNotFound is returned when LaunchVersion is given a name
+// that does not appear in Simulators().
+var ErrSimulatorNotFound = errors.New("bitbox/simulator: requested version not in embedded list")
+
+func resolveBinary(cacheDir, name string) (string, error) {
 	if override := os.Getenv("BITBOX_SIMULATOR"); override != "" {
 		abs, err := filepath.Abs(override)
 		if err != nil {
@@ -123,5 +142,36 @@ func resolveBinary(cacheDir string) (string, error) {
 	if len(bins) == 0 {
 		return "", errors.New("bitbox/simulator: no embedded simulator list")
 	}
-	return cache.Resolve(bins[0])
+	if name == "" {
+		return cache.Resolve(bins[0])
+	}
+	for _, b := range bins {
+		if b.Name == name {
+			return cache.Resolve(b)
+		}
+	}
+	return "", fmt.Errorf("%w: %q (have: %s)", ErrSimulatorNotFound, name, listNames(bins))
+}
+
+// listNames renders the embedded names as a comma-joined string for
+// error messages.
+func listNames(bins []coresim.Binary) string {
+	names := make([]string, len(bins))
+	for i, b := range bins {
+		names[i] = b.Name
+	}
+	return joinNames(names)
+}
+
+// joinNames is strings.Join factored out so the simulator package
+// does not pull in "strings" purely for an error helper.
+func joinNames(names []string) string {
+	out := ""
+	for i, n := range names {
+		if i > 0 {
+			out += ", "
+		}
+		out += n
+	}
+	return out
 }
diff --git a/go/cmd/bitbox-simulator-check/main.go b/go/cmd/bitbox-simulator-check/main.go
index 206fe48..fbd8992 100644
--- a/go/cmd/bitbox-simulator-check/main.go
+++ b/go/cmd/bitbox-simulator-check/main.go
@@ -14,6 +14,8 @@
 //	bitbox-simulator-check --output report.md       # write to file
 //	bitbox-simulator-check --cache ~/.bitbox-cache  # reuse downloaded binaries
 //	bitbox-simulator-check --fail-on-skip           # treat skip as failure
+//	bitbox-simulator-check --firmware bitbox02-multi-9.21.0  # specific build
+//	bitbox-simulator-check --firmware all           # matrix: every embedded build
 //
 // Exit codes:
 //
@@ -32,8 +34,6 @@ import (
 	"runtime"
 	"time"
 
-	"github.com/BitBoxSwiss/bitbox02-api-go/api/firmware"
-	"github.com/flynn/noise"
 	"github.com/DFXswiss/bitbox-testkit/go/bitbox/simulator"
 )
 
@@ -42,6 +42,7 @@ func main() {
 	output := flag.String("output", "", "Write report to file instead of stdout.")
 	cacheDir := flag.String("cache", "", "Simulator-binary cache dir (default: $TMPDIR/bitbox-testkit-simcache).")
 	failOnSkip := flag.Bool("fail-on-skip", false, "Exit nonzero if scenarios were skipped (non-Linux host).")
+	firmware := flag.String("firmware", "", "Specific embedded firmware name (e.g. bitbox02-multi-9.21.0), or \"all\" for matrix. Default: newest.")
 	version := flag.Bool("version", false, "Print version and exit.")
 	flag.Parse()
 
@@ -55,19 +56,19 @@ func main() {
 		os.Exit(3)
 	}
 
-	report := buildReport(*cacheDir, *failOnSkip)
+	matrix := buildMatrixReport(*cacheDir, *failOnSkip, *firmware)
 
 	var rendered []byte
 	switch *format {
 	case "json":
-		b, err := json.MarshalIndent(report, "", "  ")
+		b, err := json.MarshalIndent(matrix, "", "  ")
 		if err != nil {
 			fmt.Fprintln(os.Stderr, "marshal:", err)
 			os.Exit(2)
 		}
 		rendered = append(b, '\n')
 	default:
-		rendered = []byte(renderMarkdown(report))
+		rendered = []byte(renderMatrixMarkdown(matrix))
 	}
 
 	if *output != "" {
@@ -79,10 +80,24 @@ func main() {
 		_, _ = os.Stdout.Write(rendered)
 	}
 
-	os.Exit(report.ExitCode)
+	os.Exit(matrix.ExitCode)
 }
 
-// Report is the JSON-serialisable summary of a simulator run.
+// MatrixReport wraps one or more per-firmware reports. For the common
+// single-firmware run only the first element of Reports is populated;
+// the wrapper still gives consumers (CI parser, downstream tooling) a
+// stable shape that scales from N=1 to N=many without a schema fork.
+type MatrixReport struct {
+	Host     string   `json:"host"`
+	Started  time.Time `json:"started"`
+	Finished time.Time `json:"finished"`
+	Reports  []Report  `json:"reports"`
+	// ExitCode is the rollup: max(individual exit codes). A single
+	// failed scenario in any firmware tips the whole matrix red.
+	ExitCode int `json:"exit_code"`
+}
+
+// Report is the JSON-serialisable summary of a single simulator run.
 type Report struct {
 	Host       string              `json:"host"`
 	Skipped    bool                `json:"skipped"`
@@ -102,7 +117,38 @@ type Summary struct {
 	Failed int `json:"failed"`
 }
 
-func buildReport(cacheDirFlag string, failOnSkip bool) Report {
+// resolveFirmwareList expands the --firmware flag into the list of
+// binary names to run against. "" → just the newest. "all" → every
+// embedded binary. Anything else is treated as a single explicit name.
+func resolveFirmwareList(firmware string) []string {
+	if firmware == "all" {
+		bins := simulator.Simulators()
+		out := make([]string, len(bins))
+		for i, b := range bins {
+			out[i] = b.Name
+		}
+		return out
+	}
+	return []string{firmware}
+}
+
+func buildMatrixReport(cacheDirFlag string, failOnSkip bool, firmware string) MatrixReport {
+	started := time.Now()
+	host := fmt.Sprintf("%s/%s", runtime.GOOS, runtime.GOARCH)
+	matrix := MatrixReport{Host: host, Started: started}
+
+	for _, name := range resolveFirmwareList(firmware) {
+		r := buildReport(cacheDirFlag, failOnSkip, name)
+		matrix.Reports = append(matrix.Reports, r)
+		if r.ExitCode > matrix.ExitCode {
+			matrix.ExitCode = r.ExitCode
+		}
+	}
+	matrix.Finished = time.Now()
+	return matrix
+}
+
+func buildReport(cacheDirFlag string, failOnSkip bool, firmwareName string) Report {
 	started := time.Now()
 	host := fmt.Sprintf("%s/%s", runtime.GOOS, runtime.GOARCH)
 
@@ -129,51 +175,29 @@ func buildReport(cacheDirFlag string, failOnSkip bool) Report {
 		return failed(started, host, fmt.Errorf("mkdir cache: %w", err))
 	}
 
-	inst, err := simulator.Launch(cacheDir)
+	inst, err := simulator.LaunchVersion(cacheDir, firmwareName)
 	if err != nil {
-		return failed(started, host, fmt.Errorf("simulator.Launch: %w", err))
+		return failed(started, host, fmt.Errorf("simulator.LaunchVersion(%q): %w", firmwareName, err))
 	}
 	defer inst.Stop()
 
-	// The firmware client expects a Communication, a ConfigInterface
-	// (for persisting Noise keys across sessions — we keep them in-mem
-	// since the simulator is throw-away), and a Logger.
-	dev := firmware.NewDevice(
-		nil, // version: query from device via OP_INFO (firmware ≥ 4.3.0)
-		nil, // product: same
-		&memoryConfig{},
-		inst.Comm,
-		noopLogger{},
-	)
-	if err := dev.Init(); err != nil {
-		return failed(started, host, fmt.Errorf("firmware.Device.Init: %w", err))
+	dev, err := simulator.Connect(inst, simulator.ConnectOptions{})
+	if err != nil {
+		return failed(started, host, err)
 	}
 
-	// Noise XX handshake completed; now wait for the simulator firmware
-	// to mark the pairing as device-confirmed (it auto-confirms within
-	// a few hundred ms — on a physical BitBox this would require the
-	// user to compare + tap on the device screen). Once confirmed, we
-	// acknowledge from the app side via ChannelHashVerify(true), and
-	// the firmware unlocks the rest of the API surface.
-	deadline := time.Now().Add(5 * time.Second)
-	for {
-		_, verified := dev.ChannelHash()
-		if verified {
-			dev.ChannelHashVerify(true)
-			break
-		}
-		if time.Now().After(deadline) {
-			return failed(started, host, fmt.Errorf(
-				"firmware.Device: channel-hash never device-verified within 5s — the simulator should auto-confirm",
-			))
-		}
-		time.Sleep(100 * time.Millisecond)
+	// Resolve the actual firmware name we ended up running. For ""
+	// this is the newest embedded; for an explicit value it is just
+	// that value.
+	resolvedName := firmwareName
+	if resolvedName == "" {
+		resolvedName = simulator.Simulators()[0].Name
 	}
 
 	report := Report{
 		Host:     host,
 		Started:  started,
-		Firmware: simulator.Simulators()[0].Name,
+		Firmware: resolvedName,
 	}
 	for _, scenario := range simulator.BaselineScenarios() {
 		res := scenario(dev)
@@ -209,6 +233,34 @@ func failed(started time.Time, host string, err error) Report {
 	}
 }
 
+func renderMatrixMarkdown(m MatrixReport) string {
+	if len(m.Reports) == 0 {
+		return "# BitBox02 simulator check\n\n(no firmware reports — check flags)\n"
+	}
+	if len(m.Reports) == 1 {
+		return renderMarkdown(m.Reports[0])
+	}
+	// Matrix render: one section per firmware + a rolled-up header.
+	out := "# BitBox02 simulator check — firmware matrix\n\n"
+	out += fmt.Sprintf("Host: `%s` — Started: %s — Total duration: %s — Firmware tested: %d\n\n",
+		m.Host, m.Started.Format(time.RFC3339),
+		m.Finished.Sub(m.Started).Round(time.Millisecond), len(m.Reports))
+	out += "| Firmware | Passed | Failed | Total | Duration |\n"
+	out += "|---|---:|---:|---:|---:|\n"
+	for _, r := range m.Reports {
+		out += fmt.Sprintf("| `%s` | %d | %d | %d | %s |\n",
+			r.Firmware, r.Summary.Passed, r.Summary.Failed, r.Summary.Total,
+			r.Finished.Sub(r.Started).Round(time.Millisecond))
+	}
+	out += "\n"
+	for _, r := range m.Reports {
+		out += "## " + r.Firmware + "\n\n"
+		out += renderMarkdown(r)
+		out += "\n"
+	}
+	return out
+}
+
 func renderMarkdown(r Report) string {
 	out := "# BitBox02 simulator check\n\n"
 	out += fmt.Sprintf("Host: `%s` — Started: %s — Duration: %s\n\n",
@@ -236,38 +288,3 @@ func renderMarkdown(r Report) string {
 		r.Summary.Total, r.Summary.Passed, r.Summary.Failed)
 	return out
 }
-
-// memoryConfig is a minimal in-memory ConfigInterface. The simulator
-// regenerates the app keypair every run because the kept-state never
-// survives the process exit; that's fine for a CI run.
-type memoryConfig struct {
-	devicePubkeys [][]byte
-	appKey        *noise.DHKey
-}
-
-func (c *memoryConfig) ContainsDeviceStaticPubkey(pubkey []byte) bool {
-	for _, k := range c.devicePubkeys {
-		if string(k) == string(pubkey) {
-			return true
-		}
-	}
-	return false
-}
-func (c *memoryConfig) AddDeviceStaticPubkey(pubkey []byte) error {
-	c.devicePubkeys = append(c.devicePubkeys, append([]byte(nil), pubkey...))
-	return nil
-}
-func (c *memoryConfig) GetAppNoiseStaticKeypair() *noise.DHKey { return c.appKey }
-func (c *memoryConfig) SetAppNoiseStaticKeypair(key *noise.DHKey) error {
-	c.appKey = key
-	return nil
-}
-
-// noopLogger silences the firmware library's logging output. The CLI
-// captures pass/fail per scenario, so a debug stream from the library
-// would just be noise.
-type noopLogger struct{}
-
-func (noopLogger) Error(string, error) {}
-func (noopLogger) Info(string)         {}
-func (noopLogger) Debug(string)        {}
diff --git a/go/cmd/release-version/main.go b/go/cmd/release-version/main.go
new file mode 100644
index 0000000..ac93383
--- /dev/null
+++ b/go/cmd/release-version/main.go
@@ -0,0 +1,317 @@
+// Command release-version reads commit subjects since a base ref and
+// decides the next semantic-version bump according to Conventional
+// Commits 1.0 — emitting the next "vMAJOR.MINOR.PATCH" tag (or, with
+// --report, a human-readable explanation) on stdout.
+//
+// Designed to be called from the auto-tag CI workflow:
+//
+//	NEXT=$(go run ./cmd/release-version --base "$LATEST_TAG")
+//	git tag -a "$NEXT" -m "Release $NEXT"
+//
+// Conventional Commits → semver mapping (see CONTRIBUTING.md "Releases"):
+//
+//	feat!:, fix!:, refactor!:, ...        → MAJOR
+//	BREAKING CHANGE: in body              → MAJOR
+//	feat:, feat(scope):                   → MINOR
+//	fix:, perf:, refactor:                → PATCH
+//	chore:, ci:, docs:, test:, style:,    → PATCH (defensive default)
+//	build:, revert:                       → PATCH
+//	(unrecognised subject)                → PATCH + warning on stderr
+//
+// Aggregate over every commit in the range: pick the highest bump
+// encountered. A merge commit subject ("Merge pull request #N from …")
+// is ignored — only the squash-style commits the PR actually
+// contributed are read. Empty ranges report exit 4 ("no commits, no
+// release") so the caller can short-circuit.
+//
+// Exit codes:
+//
+//	0  success — wrote next tag to stdout
+//	2  invalid CLI flags or git invocation failed
+//	3  base ref does not exist / unparseable input
+//	4  no commits in the range (caller should skip the release step)
+package main
+
+import (
+	"errors"
+	"flag"
+	"fmt"
+	"os"
+	"os/exec"
+	"regexp"
+	"strconv"
+	"strings"
+)
+
+func main() {
+	base := flag.String("base", "", "Base ref (latest tag). Range = base..HEAD. Empty = treat as initial release.")
+	head := flag.String("head", "HEAD", "Head ref. Default HEAD.")
+	report := flag.Bool("report", false, "Print a human-readable summary to stdout instead of just the version.")
+	initial := flag.String("initial", "v0.1.0", "Tag to emit when base is empty (no prior tags).")
+	flag.Parse()
+
+	if err := run(*base, *head, *report, *initial, os.Stdout, os.Stderr); err != nil {
+		exitCode := 2
+		var ce codedError
+		if errors.As(err, &ce) {
+			exitCode = ce.code
+		}
+		fmt.Fprintln(os.Stderr, "release-version:", err)
+		os.Exit(exitCode)
+	}
+}
+
+// codedError is a sentinel error that carries an exit code.
+type codedError struct {
+	code int
+	err  error
+}
+
+func (e codedError) Error() string { return e.err.Error() }
+func (e codedError) Unwrap() error { return e.err }
+
+func coded(code int, err error) error { return codedError{code: code, err: err} }
+
+func run(base, head string, report bool, initial string, stdout, stderr *os.File) error {
+	if base == "" {
+		fmt.Fprintln(stdout, initial)
+		if report {
+			fmt.Fprintln(stdout, "(no prior tag — emitting initial)")
+		}
+		return nil
+	}
+
+	current, err := parseSemver(base)
+	if err != nil {
+		return coded(3, fmt.Errorf("parse base %q: %w", base, err))
+	}
+
+	commits, err := gitLog(base, head)
+	if err != nil {
+		return coded(2, err)
+	}
+	if len(commits) == 0 {
+		return coded(4, fmt.Errorf("no commits between %s..%s — no release", base, head))
+	}
+
+	decision := decideBump(commits, stderr)
+	next := applyBump(current, decision.Bump)
+
+	fmt.Fprintln(stdout, next)
+	if report {
+		fmt.Fprintln(stdout)
+		fmt.Fprintln(stdout, decision.Report())
+	}
+	return nil
+}
+
+// gitLog returns the commit subjects + bodies (separated by NUL byte) in
+// base..head. Subjects/bodies are joined by \x00 inside one record;
+// records are separated by \x1e (record-separator) so multi-paragraph
+// bodies stay intact.
+func gitLog(base, head string) ([]Commit, error) {
+	rng := base + ".." + head
+	cmd := exec.Command("git", "log", "--no-merges",
+		"--pretty=format:%s%x00%b%x1e", rng)
+	out, err := cmd.Output()
+	if err != nil {
+		return nil, fmt.Errorf("git log %s: %w", rng, err)
+	}
+	return parseLog(string(out)), nil
+}
+
+// parseLog splits the git log output into Commit records. Exposed for
+// testing — callers feed pre-recorded fixture strings.
+func parseLog(s string) []Commit {
+	var out []Commit
+	for _, rec := range strings.Split(s, "\x1e") {
+		rec = strings.Trim(rec, "\n")
+		if rec == "" {
+			continue
+		}
+		// One record = subject \x00 body
+		parts := strings.SplitN(rec, "\x00", 2)
+		c := Commit{Subject: strings.TrimSpace(parts[0])}
+		if len(parts) == 2 {
+			c.Body = strings.TrimSpace(parts[1])
+		}
+		out = append(out, c)
+	}
+	return out
+}
+
+// Commit is one log record.
+type Commit struct {
+	Subject string
+	Body    string
+}
+
+// Bump enumerates the semver bump levels.
+type Bump int
+
+const (
+	BumpNone Bump = iota
+	BumpPatch
+	BumpMinor
+	BumpMajor
+)
+
+func (b Bump) String() string {
+	switch b {
+	case BumpMajor:
+		return "major"
+	case BumpMinor:
+		return "minor"
+	case BumpPatch:
+		return "patch"
+	}
+	return "none"
+}
+
+// subjectPattern matches Conventional Commits subject lines:
+//
+//	type(scope)!: message
+//	   ^^^^^ ^^^ ^
+//	   group1  group2 (the breaking "!")
+var subjectPattern = regexp.MustCompile(`^(\w+)(?:\([^)]+\))?(!)?:\s+\S`)
+
+// breakingPattern matches "BREAKING CHANGE:" (or "BREAKING-CHANGE:")
+// anywhere in the commit body — the spec allows both spellings.
+var breakingPattern = regexp.MustCompile(`(?m)^BREAKING[ -]CHANGE:`)
+
+// decideBump aggregates per-commit bumps and returns the highest.
+func decideBump(commits []Commit, warnOut *os.File) Decision {
+	d := Decision{TotalCommits: len(commits)}
+	for _, c := range commits {
+		b, why := classify(c)
+		d.PerCommit = append(d.PerCommit, CommitBump{Commit: c, Bump: b, Reason: why})
+		if b > d.Bump {
+			d.Bump = b
+		}
+		switch b {
+		case BumpMajor:
+			d.MajorCount++
+		case BumpMinor:
+			d.MinorCount++
+		case BumpPatch:
+			d.PatchCount++
+		}
+		if why == reasonUnrecognised && warnOut != nil {
+			fmt.Fprintf(warnOut,
+				"release-version: warning — non-conventional subject %q, treating as patch\n",
+				c.Subject)
+		}
+	}
+	if d.Bump == BumpNone && d.TotalCommits > 0 {
+		// All commits were "no bump" classified (currently unreachable
+		// because unrecognised falls back to patch, but defensive).
+		d.Bump = BumpPatch
+	}
+	return d
+}
+
+const (
+	reasonBreakingSuffix = "subject contains '!:' breaking suffix"
+	reasonBreakingBody   = "body contains BREAKING CHANGE: footer"
+	reasonFeat           = "feat: subject"
+	reasonFix            = "fix/perf/refactor/etc subject"
+	reasonNoOp           = "chore/ci/docs/test/style/build subject (patch-only categories)"
+	reasonUnrecognised   = "non-conventional subject — defaulting to patch"
+)
+
+// classify decides the bump level for a single commit, returning the
+// human-readable reason for the report.
+func classify(c Commit) (Bump, string) {
+	if breakingPattern.MatchString(c.Body) {
+		return BumpMajor, reasonBreakingBody
+	}
+	m := subjectPattern.FindStringSubmatch(c.Subject)
+	if m == nil {
+		return BumpPatch, reasonUnrecognised
+	}
+	typ := strings.ToLower(m[1])
+	breaking := m[2] == "!"
+	if breaking {
+		return BumpMajor, reasonBreakingSuffix
+	}
+	switch typ {
+	case "feat":
+		return BumpMinor, reasonFeat
+	case "fix", "perf", "refactor", "revert":
+		return BumpPatch, reasonFix
+	default:
+		// chore, ci, docs, test, style, build — patch-only categories.
+		// Still bump patch so the release isn't completely missed.
+		return BumpPatch, reasonNoOp
+	}
+}
+
+// Decision is the aggregated outcome over all commits in the range.
+type Decision struct {
+	TotalCommits int
+	MajorCount   int
+	MinorCount   int
+	PatchCount   int
+	Bump         Bump
+	PerCommit    []CommitBump
+}
+
+// CommitBump is one commit's decision.
+type CommitBump struct {
+	Commit Commit
+	Bump   Bump
+	Reason string
+}
+
+// Report renders a multi-line summary suitable for CI logs.
+func (d Decision) Report() string {
+	var b strings.Builder
+	fmt.Fprintf(&b, "commits analysed: %d (major:%d minor:%d patch:%d)\n",
+		d.TotalCommits, d.MajorCount, d.MinorCount, d.PatchCount)
+	fmt.Fprintf(&b, "winning bump: %s\n\n", d.Bump)
+	fmt.Fprintln(&b, "per-commit breakdown:")
+	for _, cb := range d.PerCommit {
+		// Truncate subject to keep CI logs readable; full subject is in
+		// the git history if a maintainer needs it.
+		subj := cb.Commit.Subject
+		if len(subj) > 72 {
+			subj = subj[:69] + "..."
+		}
+		fmt.Fprintf(&b, "  [%s] %s — %s\n", cb.Bump, subj, cb.Reason)
+	}
+	return b.String()
+}
+
+// applyBump computes the next semver from current + bump.
+func applyBump(cur Semver, b Bump) string {
+	switch b {
+	case BumpMajor:
+		return fmt.Sprintf("v%d.0.0", cur.Major+1)
+	case BumpMinor:
+		return fmt.Sprintf("v%d.%d.0", cur.Major, cur.Minor+1)
+	default:
+		return fmt.Sprintf("v%d.%d.%d", cur.Major, cur.Minor, cur.Patch+1)
+	}
+}
+
+// Semver is a parsed major.minor.patch.
+type Semver struct {
+	Major, Minor, Patch int
+}
+
+var semverPattern = regexp.MustCompile(`^v?(\d+)\.(\d+)\.(\d+)$`)
+
+// parseSemver accepts "v0.4.6" or "0.4.6". Pre-release / build metadata
+// is intentionally NOT supported here — the auto-tag flow only deals in
+// release tags, not pre-releases.
+func parseSemver(s string) (Semver, error) {
+	m := semverPattern.FindStringSubmatch(s)
+	if m == nil {
+		return Semver{}, fmt.Errorf("not a vMAJOR.MINOR.PATCH tag: %q", s)
+	}
+	major, _ := strconv.Atoi(m[1])
+	minor, _ := strconv.Atoi(m[2])
+	patch, _ := strconv.Atoi(m[3])
+	return Semver{Major: major, Minor: minor, Patch: patch}, nil
+}
+
diff --git a/go/cmd/release-version/main_test.go b/go/cmd/release-version/main_test.go
new file mode 100644
index 0000000..9fae226
--- /dev/null
+++ b/go/cmd/release-version/main_test.go
@@ -0,0 +1,238 @@
+package main
+
+import (
+	"strings"
+	"testing"
+)
+
+// TestClassify locks the Conventional Commits → bump mapping. Every
+// row here is a contract the release-version tool ships to consumers;
+// changing one without updating CONTRIBUTING.md "Releases" is a bug.
+func TestClassify(t *testing.T) {
+	tests := []struct {
+		name    string
+		subject string
+		body    string
+		want    Bump
+		reason  string
+	}{
+		// MAJOR (breaking)
+		{"feat with ! suffix", "feat!: drop legacy API", "", BumpMajor, reasonBreakingSuffix},
+		{"fix with ! suffix", "fix!: invert error code semantics", "", BumpMajor, reasonBreakingSuffix},
+		{"feat with scope and ! suffix", "feat(api)!: rename endpoint", "", BumpMajor, reasonBreakingSuffix},
+		{"chore with ! suffix", "chore!: bump go.mod requires go1.25", "", BumpMajor, reasonBreakingSuffix},
+
+		// MAJOR (BREAKING CHANGE: footer)
+		{"BREAKING CHANGE colon-space", "fix: small thing", "More text.\n\nBREAKING CHANGE: removed Foo()", BumpMajor, reasonBreakingBody},
+		{"BREAKING-CHANGE hyphen", "fix: small thing", "BREAKING-CHANGE: dropped Bar", BumpMajor, reasonBreakingBody},
+		{"BREAKING CHANGE mid-body", "feat: a", "preamble\nBREAKING CHANGE: foo\ntrailer", BumpMajor, reasonBreakingBody},
+		// "BREAKING" alone, NOT a footer, doesn't trip the rule.
+		{"plain BREAKING word in body", "fix: nothing", "the work is BREAKING ground", BumpPatch, reasonFix},
+
+		// MINOR (feat)
+		{"plain feat", "feat: new scenario", "", BumpMinor, reasonFeat},
+		{"feat with scope", "feat(simulator): add BTC scenarios", "", BumpMinor, reasonFeat},
+		{"feat with multi-word scope", "feat(go cmd): blah", "", BumpMinor, reasonFeat},
+		{"feat case-insensitive type", "FEAT: capital type", "", BumpMinor, reasonFeat},
+
+		// PATCH (fix/perf/refactor/revert)
+		{"plain fix", "fix: address bug", "", BumpPatch, reasonFix},
+		{"fix with scope", "fix(audit): suppress doc-comment false positive", "", BumpPatch, reasonFix},
+		{"perf", "perf: avoid allocation in hot path", "", BumpPatch, reasonFix},
+		{"refactor", "refactor: extract helper", "", BumpPatch, reasonFix},
+		{"revert", "revert: undo bad change", "", BumpPatch, reasonFix},
+
+		// PATCH (no-op-but-still-shipped categories)
+		{"chore", "chore: dep bump", "", BumpPatch, reasonNoOp},
+		{"ci", "ci: cache go modules", "", BumpPatch, reasonNoOp},
+		{"docs", "docs: clarify CONTRIBUTING", "", BumpPatch, reasonNoOp},
+		{"test", "test: cover edge case", "", BumpPatch, reasonNoOp},
+		{"style", "style: gofmt", "", BumpPatch, reasonNoOp},
+		{"build", "build: update Makefile", "", BumpPatch, reasonNoOp},
+
+		// Unrecognised → patch + warning (the warning side is checked
+		// in TestDecideBumpWarnsOnUnrecognised below).
+		{"unrecognised no colon", "wat", "", BumpPatch, reasonUnrecognised},
+		{"missing colon", "feat new thing", "", BumpPatch, reasonUnrecognised},
+		{"colon but no message", "feat:", "", BumpPatch, reasonUnrecognised},
+		{"colon but only whitespace after", "feat:   ", "", BumpPatch, reasonUnrecognised},
+		{"odd prefix", "FEATURE: too verbose", "", BumpPatch, reasonNoOp}, // "FEATURE" parses as type → falls to default arm
+
+		// Body without breaking footer doesn't promote.
+		{"feat with normal body", "feat: thing", "we did the thing.", BumpMinor, reasonFeat},
+		{"fix with body referencing breaks", "fix: nothing", "before this change tests were breaking, now fixed", BumpPatch, reasonFix},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			got, why := classify(Commit{Subject: tt.subject, Body: tt.body})
+			if got != tt.want {
+				t.Errorf("classify(%q, body=%q) = %s, want %s (reason: %s)",
+					tt.subject, tt.body, got, tt.want, why)
+			}
+			if why != tt.reason {
+				t.Errorf("classify(%q) reason = %q, want %q", tt.subject, why, tt.reason)
+			}
+		})
+	}
+}
+
+// TestDecideBumpPicksHighest verifies the aggregator picks the largest
+// bump across a range, not "the last one wins" or anything similar.
+func TestDecideBumpPicksHighest(t *testing.T) {
+	commits := []Commit{
+		{Subject: "chore: dep bump"},
+		{Subject: "fix: small thing"},
+		{Subject: "feat: new scenario"}, // this one promotes the whole range to MINOR
+		{Subject: "docs: update README"},
+	}
+	d := decideBump(commits, nil)
+	if d.Bump != BumpMinor {
+		t.Fatalf("decideBump = %s, want minor", d.Bump)
+	}
+	if d.TotalCommits != 4 {
+		t.Errorf("TotalCommits = %d, want 4", d.TotalCommits)
+	}
+	if d.MinorCount != 1 {
+		t.Errorf("MinorCount = %d, want 1", d.MinorCount)
+	}
+	if d.PatchCount != 3 {
+		t.Errorf("PatchCount = %d, want 3", d.PatchCount)
+	}
+}
+
+// TestDecideBumpBreakingWins verifies that a single breaking change in
+// a sea of patches still bumps major.
+func TestDecideBumpBreakingWins(t *testing.T) {
+	commits := []Commit{
+		{Subject: "fix: a"},
+		{Subject: "fix: b"},
+		{Subject: "feat!: remove deprecated API"},
+		{Subject: "chore: c"},
+	}
+	d := decideBump(commits, nil)
+	if d.Bump != BumpMajor {
+		t.Fatalf("decideBump = %s, want major", d.Bump)
+	}
+	if d.MajorCount != 1 {
+		t.Errorf("MajorCount = %d, want 1", d.MajorCount)
+	}
+}
+
+// TestApplyBumpMatrix locks the SemVer math.
+func TestApplyBumpMatrix(t *testing.T) {
+	cur := Semver{Major: 0, Minor: 4, Patch: 6}
+	tests := []struct {
+		bump Bump
+		want string
+	}{
+		{BumpPatch, "v0.4.7"},
+		{BumpMinor, "v0.5.0"},
+		{BumpMajor, "v1.0.0"},
+		{BumpNone, "v0.4.7"}, // defensive — falls to patch path
+	}
+	for _, tt := range tests {
+		got := applyBump(cur, tt.bump)
+		if got != tt.want {
+			t.Errorf("applyBump(%v, %s) = %q, want %q", cur, tt.bump, got, tt.want)
+		}
+	}
+}
+
+// TestApplyBumpResetsLowerComponents — a minor bump zeroes patch, a
+// major bump zeroes minor AND patch.
+func TestApplyBumpResetsLowerComponents(t *testing.T) {
+	cur := Semver{Major: 1, Minor: 7, Patch: 3}
+	if got := applyBump(cur, BumpMinor); got != "v1.8.0" {
+		t.Errorf("minor bump from 1.7.3 = %q, want v1.8.0", got)
+	}
+	if got := applyBump(cur, BumpMajor); got != "v2.0.0" {
+		t.Errorf("major bump from 1.7.3 = %q, want v2.0.0", got)
+	}
+}
+
+// TestParseSemver accepts the v-prefix or bare form; rejects everything
+// else loudly so the caller can't silently feed a tag that the auto-
+// tag script can't increment.
+func TestParseSemver(t *testing.T) {
+	good := map[string]Semver{
+		"v0.4.6": {0, 4, 6},
+		"0.4.6":  {0, 4, 6},
+		"v1.0.0": {1, 0, 0},
+		"v10.20.30": {10, 20, 30},
+	}
+	for in, want := range good {
+		got, err := parseSemver(in)
+		if err != nil {
+			t.Errorf("parseSemver(%q) errored: %v", in, err)
+			continue
+		}
+		if got != want {
+			t.Errorf("parseSemver(%q) = %+v, want %+v", in, got, want)
+		}
+	}
+
+	bad := []string{
+		"",
+		"main",
+		"v0.4",         // missing patch
+		"v0.4.6-rc1",   // pre-release not supported
+		"v0.4.6+build", // build metadata not supported
+		"go/v0.4.6",    // submodule prefix is for the OTHER tag, not this one
+		"latest",
+	}
+	for _, in := range bad {
+		if _, err := parseSemver(in); err == nil {
+			t.Errorf("parseSemver(%q) accepted, want error", in)
+		}
+	}
+}
+
+// TestParseLogIgnoresEmpty + handles trailing record-separator + keeps
+// multi-paragraph bodies intact.
+func TestParseLog(t *testing.T) {
+	// Format: subject \x00 body \x1e
+	in := "feat: thing\x00body line 1\n\nbody line 2\x1e" +
+		"fix: other\x00\x1e" +
+		"\x1e" + // empty record between (gracefully ignored)
+		"chore: third\x00single-line body\x1e"
+	got := parseLog(in)
+	if len(got) != 3 {
+		t.Fatalf("parseLog returned %d records, want 3 (got: %+v)", len(got), got)
+	}
+	if got[0].Subject != "feat: thing" {
+		t.Errorf("record 0 subject = %q", got[0].Subject)
+	}
+	if !strings.Contains(got[0].Body, "body line 1") || !strings.Contains(got[0].Body, "body line 2") {
+		t.Errorf("record 0 body lost multi-paragraph content: %q", got[0].Body)
+	}
+	if got[1].Subject != "fix: other" || got[1].Body != "" {
+		t.Errorf("record 1 = %+v, want subject 'fix: other' empty body", got[1])
+	}
+	if got[2].Subject != "chore: third" {
+		t.Errorf("record 2 subject = %q", got[2].Subject)
+	}
+}
+
+// TestReportShape locks the report text format because consumers
+// (CI logs, release-notes generators) parse it.
+func TestReportShape(t *testing.T) {
+	d := decideBump([]Commit{
+		{Subject: "feat: a"},
+		{Subject: "fix: b"},
+	}, nil)
+	r := d.Report()
+	mustContain := []string{
+		"commits analysed: 2",
+		"minor:1",
+		"patch:1",
+		"winning bump: minor",
+		"feat: a",
+		"fix: b",
+	}
+	for _, s := range mustContain {
+		if !strings.Contains(r, s) {
+			t.Errorf("report missing %q in:\n%s", s, r)
+		}
+	}
+}
diff --git a/ts/src/fake/index.ts b/ts/src/fake/index.ts
index 300fc76..1bf74c0 100644
--- a/ts/src/fake/index.ts
+++ b/ts/src/fake/index.ts
@@ -70,9 +70,19 @@ export class FakePairedBitBox {
     return this._calls.map((c) => ({ method: c.method, args: [...c.args] }));
   }
 
+  /** Clear the recorded call log without releasing the fake. */
+  clearCalls(): this {
+    this._calls = [];
+    return this;
+  }
+
   /**
    * Returns a Proxy that routes any property access into the handler
    * map. Use this as a drop-in replacement for `PairedBitBox`.
+   *
+   * The generic parameter is the wallet-API shape you expect (e.g. an
+   * import from `bitbox-api`). It is a pure type cast; the proxy does
+   * no runtime check against the type's structure.
    */
   asPairedBitBox<T = unknown>(): T {
     const self = this;
@@ -80,13 +90,20 @@ export class FakePairedBitBox {
       {},
       {
         get(_target, prop) {
-          // Allow direct access to a few synthetic helpers without
-          // going through dispatch.
+          // Symbol-keyed property lookups (Symbol.toPrimitive,
+          // Symbol.asyncIterator, then/catch probes from awaiters)
+          // must NOT be treated as dispatched methods — returning a
+          // function for `then` would make every proxy access look
+          // thenable and infect await chains.
+          if (typeof prop === 'symbol') return undefined;
+          if (prop === 'then' || prop === 'catch' || prop === 'finally') return undefined;
+
+          // Synthetic helpers exposed directly on the proxy for
+          // introspection and cleanup paths.
           if (prop === '__fake__') return self;
-          if (prop === 'close') return () => self.close();
-          if (prop === 'free') return () => self.close();
+          if (prop === 'close' || prop === 'free') return () => self.close();
 
-          const method = String(prop);
+          const method = prop;
           return (...args: unknown[]) => {
             if (self._closed) {
               return Promise.reject(new ClosedError());
diff --git a/ts/src/index.ts b/ts/src/index.ts
index ac67577..a356d89 100644
--- a/ts/src/index.ts
+++ b/ts/src/index.ts
@@ -1,12 +1,12 @@
 /**
- * Main entry point of @joshuakrueger-dfx/bitbox-testkit.
+ * Main entry point of @DFXswiss/bitbox-testkit.
  *
  * Consumers typically import from the namespaced subpaths:
  *
- *   import { FakePairedBitBox } from '@joshuakrueger-dfx/bitbox-testkit/fake';
- *   import { Registry, subset } from '@joshuakrueger-dfx/bitbox-testkit/quirks';
- *   import { scenarioRegressionUmlautEIP712 } from '@joshuakrueger-dfx/bitbox-testkit/scenarios';
- *   import { detectNonAsciiInEIP712Literals } from '@joshuakrueger-dfx/bitbox-testkit/guards';
+ *   import { FakePairedBitBox } from '@DFXswiss/bitbox-testkit/fake';
+ *   import { Registry, subset } from '@DFXswiss/bitbox-testkit/quirks';
+ *   import { scenarioRegressionUmlautEIP712 } from '@DFXswiss/bitbox-testkit/scenarios';
+ *   import { detectNonAsciiInEIP712Literals } from '@DFXswiss/bitbox-testkit/guards';
  *
  * The default export re-exposes everything for convenience.
  */
diff --git a/ts/test/fake.test.ts b/ts/test/fake.test.ts
index 7febe8a..9920df6 100644
--- a/ts/test/fake.test.ts
+++ b/ts/test/fake.test.ts
@@ -62,4 +62,28 @@ describe('FakePairedBitBox', () => {
     });
     await expect(proxy.deviceInfo()).resolves.toEqual({ name: 'BB' });
   });
+
+  it('proxy does NOT pretend to be thenable (avoids awaiter false-positives)', () => {
+    const proxy = new FakePairedBitBox().asPairedBitBox<Record<string, unknown>>();
+    // `await proxy` would call proxy.then(...) and infect chains otherwise.
+    expect(proxy.then).toBeUndefined();
+    expect(proxy.catch).toBeUndefined();
+    expect(proxy.finally).toBeUndefined();
+  });
+
+  it('proxy returns undefined for symbol-keyed lookups', () => {
+    const proxy = new FakePairedBitBox().asPairedBitBox<Record<symbol, unknown>>();
+    expect((proxy as unknown as { [Symbol.iterator]?: unknown })[Symbol.iterator]).toBeUndefined();
+  });
+
+  it('clearCalls drops the recorded log without affecting handlers', async () => {
+    const fake = new FakePairedBitBox().on('a', async () => 'x');
+    const proxy = fake.asPairedBitBox<{ a: () => Promise<string> }>();
+    await proxy.a();
+    expect(fake.calls).toHaveLength(1);
+    fake.clearCalls();
+    expect(fake.calls).toHaveLength(0);
+    // handler still works
+    await expect(proxy.a()).resolves.toBe('x');
+  });
 });
diff --git a/ts/test/quirks.test.ts b/ts/test/quirks.test.ts
index 8737c84..99d8f04 100644
--- a/ts/test/quirks.test.ts
+++ b/ts/test/quirks.test.ts
@@ -4,7 +4,11 @@ import { Registry, subset, firmwareApplies } from '../src/quirks/index.js';
 import rawJson from '../src/quirks/quirks.json';
 
 describe('quirks registry', () => {
-  it('loads every quirk from quirks.json', () => {
+  // Self-consistent count: the Registry MUST expose exactly the number
+  // of quirks documented in quirks.json. Hardcoded numbers go stale
+  // every release; reading the source-of-truth keeps the assertion
+  // load-bearing without needing a manual bump.
+  it('loads every quirk from quirks.json into the Registry', () => {
     expect(Registry.length).toBeGreaterThan(0);
     expect(Registry.length).toBe((rawJson as { quirks: unknown[] }).quirks.length);
   });