Skip to content

feat(temporal): sign captured $exception events with an HMAC attestation#62686

Closed
Gilbert09 wants to merge 1 commit into
masterfrom
tom/sign-temporal-worker-exceptions
Closed

feat(temporal): sign captured $exception events with an HMAC attestation#62686
Gilbert09 wants to merge 1 commit into
masterfrom
tom/sign-temporal-worker-exceptions

Conversation

@Gilbert09

Copy link
Copy Markdown
Member

Problem

Exceptions captured by temporal workers flow into Error Tracking and can be forwarded to
downstream consumers (e.g. an internal app that auto-triages data-import failures). Because
the PostHog ingest key is public, anyone can capture a forged $exception event that is
indistinguishable from a genuine one — there is no way for a consumer to prove an exception
actually originated from our backend. That's a problem for any consumer that takes action on
exception contents (a forged message is an injection vector).

Changes

Temporal workers now attach an HMAC-signed attestation to every captured $exception
event. The attestation is a small, self-contained JSON blob (exception type, message,
top in-app frame, plus job context like team_id/run_id/workflow_run_id when present),
carried in two custom event properties:

  • $temporal_exception_attestation — the canonical attestation string
  • $temporal_exception_signatureHMAC-SHA256(secret, attestation) hex

Custom properties pass through Error Tracking ingestion byte-for-byte (unlike the reserved
$exception_* fields, which are truncated/symbolicated server-side), so a consumer sharing
the secret can verify the signature and trust the attestation contents.

Implementation:

  • posthog/temporal/common/exception_signing.py — pure, stdlib-only builder + signer +
    a before_send hook factory (make_exception_signer).
  • Wired in start_temporal_worker.py: when TEMPORAL_EXCEPTION_SIGNING_SECRET is set,
    posthoganalytics.before_send is registered once per worker process. This covers every
    task queue and every capture path
    (the temporal interceptor and inline
    capture_exception calls all funnel through the same client), and is scoped to worker
    processes only — the web/celery processes are untouched.
  • TEMPORAL_EXCEPTION_SIGNING_SECRET setting (no-op when unset).

The mechanism is deliberately generic (not tied to any one product) — any temporal worker's
exceptions are signed, and any consumer with the secret can verify them.

How did you test this code?

This change was agent-assisted. I have not run the full Django pytest suite locally (the
local dev venv is incomplete — rapidfuzz and others are missing, so django.setup() fails
at collection); it will run in CI.

  • Added posthog/temporal/common/test_exception_signing.py (parametrized): signature
    determinism, non-$exception passthrough, missing/empty fields tolerated, message
    truncation, top-frame extraction, signature reproducibility, and that the hook never raises
    on malformed events.
  • Verified the module's logic standalone (it's pure stdlib): built a representative
    $exception event, ran the hook, and confirmed the two properties appear and the signature
    reproduces.
  • ruff check and ruff format --check pass on all changed files.
  • Cross-checked the exact attestation string + signature against the consumer's independent
    TypeScript implementation — both produce identical output for the same input (a parity test
    is committed on the consumer side).

Deploy note: the consumer enforces signatures strictly, so this worker change must be
deployed (and TEMPORAL_EXCEPTION_SIGNING_SECRET set, same value on both sides) before
the consumer starts enforcing, to avoid a gap.

Temporal workers attach a signed, self-contained attestation to every captured
$exception event so downstream consumers can verify the exception genuinely
originated from our backend (the PostHog ingest key is public, so forged
exceptions are otherwise indistinguishable). Wired as a posthoganalytics
before_send hook in the worker bootstrap, covering all task queues and capture
paths. No-op unless TEMPORAL_EXCEPTION_SIGNING_SECRET is set.
@Gilbert09 Gilbert09 self-assigned this Jun 10, 2026
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/temporal/common/exception_signing.py:12-21
**Docstring "Pure stdlib" claim is incorrect**

The module-level docstring explicitly states _"Pure stdlib so it is safe inside the workflow sandbox"_, but `structlog` (line 21) is a third-party package, not stdlib. Anyone reading this claim and importing the module inside a Temporal workflow sandbox would encounter an import error or sandbox violation, since sandbox environments typically restrict non-stdlib, non-whitelisted imports. Using `logging` from stdlib instead of `structlog` would make the claim true and remove the risk.

Reviews (1): Last reviewed commit: "feat(temporal): sign captured $exception..." | Re-trigger Greptile

Comment on lines +12 to +21
all task queues. Pure stdlib so it is safe inside the workflow sandbox.
"""

import hmac
import json
import hashlib
import datetime as dt
from typing import Any, Callable, Optional

import structlog

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Docstring "Pure stdlib" claim is incorrect

The module-level docstring explicitly states "Pure stdlib so it is safe inside the workflow sandbox", but structlog (line 21) is a third-party package, not stdlib. Anyone reading this claim and importing the module inside a Temporal workflow sandbox would encounter an import error or sandbox violation, since sandbox environments typically restrict non-stdlib, non-whitelisted imports. Using logging from stdlib instead of structlog would make the claim true and remove the risk.

Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/temporal/common/exception_signing.py
Line: 12-21

Comment:
**Docstring "Pure stdlib" claim is incorrect**

The module-level docstring explicitly states _"Pure stdlib so it is safe inside the workflow sandbox"_, but `structlog` (line 21) is a third-party package, not stdlib. Anyone reading this claim and importing the module inside a Temporal workflow sandbox would encounter an import error or sandbox violation, since sandbox environments typically restrict non-stdlib, non-whitelisted imports. Using `logging` from stdlib instead of `structlog` would make the claim true and remove the risk.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions

Copy link
Copy Markdown
Contributor

🎭 Playwright report · View test results →

1 failed test:

  • Save an insight, make changes, discard them, and save a copy (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

@Gilbert09

Copy link
Copy Markdown
Member Author

Superseding this. The HMAC-in-before_send approach only authenticates the webhook delivery, but our consumer re-reads the exception from PostHog where forged events can group into a genuine issue — so signing the delivery doesn't secure it. Replaced by a platform feature: SDKs sign with a per-customer Ed25519 key and error-tracking ingestion (cymbal) verifies + stamps a server-only $exception_verified flag. See PostHog/posthog-python#657, #62750, #62751.

@Gilbert09 Gilbert09 closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant