Skip to content

Implement Edge Cookie identity system with KV graph and integration tests#621

Open
ChristianPavilonis wants to merge 35 commits intomainfrom
feature/edge-cookies-final
Open

Implement Edge Cookie identity system with KV graph and integration tests#621
ChristianPavilonis wants to merge 35 commits intomainfrom
feature/edge-cookies-final

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis commented Apr 7, 2026

Summary

  • Implement the Edge Cookie (EC) identity system — a publisher-owned, first-party cookie-based identity mechanism that replaces the previous Synthetic ID concept. EC enables consent-gated identity generation, a KV-backed identity graph with CAS concurrency, partner sync (pixel, batch, pull), an identify endpoint, and auction bidstream decoration with partner EIDs.
  • Add 7 integration test scenarios covering the full EC lifecycle: generation, consent withdrawal, identify edge cases, concurrent partner syncs, and batch sync flows.
  • Rename "Synthetic ID" → "Edge Cookie" across the entire codebase, configuration, and documentation.

Changes

File / Area Change
crates/trusted-server-core/src/ec/ (new module) Full EC implementation: mod.rs (lifecycle orchestration), generation.rs (HMAC-SHA256), cookies.rs (cookie read/write), consent.rs (consent gating), device.rs (device signal derivation, bot gate), kv.rs + kv_types.rs (KV identity graph with CAS), partner.rs (partner registry + admin), sync_pixel.rs (pixel sync), batch_sync.rs (S2S batch sync), pull_sync.rs (background pull sync), identify.rs (identity lookup with CORS), eids.rs (EID encoding), finalize.rs (response finalization middleware), admin.rs (admin endpoints)
crates/integration-tests/ 7 new EC lifecycle test scenarios in tests/common/ec.rs and tests/frameworks/scenarios.rs; updated integration.rs test runner and Viceroy config fixtures
crates/trusted-server-adapter-fastly/src/main.rs New EC endpoint routes namespaced under /_ts/api/v1/ and /_ts/admin/; send_to_client() pattern with background pull sync dispatch
crates/trusted-server-core/src/auction/ endpoints.rs and formats.rs updated to decorate bid requests with partner EIDs from the KV identity graph (user.id, user.eids, user.consent)
crates/trusted-server-core/src/integrations/prebid.rs Bidder deduplication fix; blank auction EC header handling
crates/js/lib/src/integrations/prebid/index.ts JS-side Prebid integration updates for EC headers
crates/trusted-server-core/src/synthetic.rs Deleted — replaced by the EC module
crates/trusted-server-core/src/cookies.rs Slimmed down; EC-specific cookie logic moved to ec/cookies.rs
crates/trusted-server-core/src/settings.rs + settings_data.rs EC configuration fields, partner config, consent config migration
crates/trusted-server-core/src/consent/ Updated consent module for EC-specific consent checks; KV consent storage
crates/trusted-server-core/src/http_util.rs New HTTP helpers for EC endpoints (CORS, request parsing)
crates/trusted-server-core/src/proxy.rs EC header propagation in proxy layer
crates/trusted-server-core/src/publisher.rs Publisher config updates for EC
crates/trusted-server-core/src/constants.rs New EC-related constants
docs/guide/ec-setup-guide.md (new) End-to-end EC setup documentation
docs/guide/edge-cookies.md (new) EC concept overview
docs/guide/api-reference.md API reference updated with all EC endpoints
docs/guide/configuration.md Configuration docs updated for EC settings
docs/guide/synthetic-ids.md Deleted — replaced by EC docs
docs/ (various) Documentation updates reflecting Synthetic ID → Edge Cookie rename
docs/internal/superpowers/specs/2026-03-24-ssc-technical-spec-design.md EC technical spec updated with schema extensions, device signals, bot gate
fastly.toml New KV store bindings for EC
trusted-server.toml EC configuration section added

Closes

Closes #532
Closes #533
Closes #535
Closes #536
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #542
Closes #543
Closes #544
Closes #611
Closes #612

Follow-up

Follow-up: The pull sync mechanism (ec/pull_sync.rs) currently passes ec_id and client_ip as query parameters to partner endpoints. While connections are HTTPS-only, these values can appear in partner server access logs and CDN logs. A future improvement should migrate to POST body or hash/encrypt identifiers before transmission. Tracked separately.

Test plan

  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS tests: cd crates/js/lib && npx vitest run
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format
  • Other: 7 EC lifecycle integration test scenarios (full lifecycle, consent withdrawal, identify edge cases, concurrent partner syncs, batch sync)

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

@ChristianPavilonis ChristianPavilonis self-assigned this Apr 7, 2026
- Rename 'Synthetic ID' to 'Edge Cookie (EC)' across all external-facing
  identifiers, config, internal Rust code, and documentation
- Simplify EC hash generation to use only client IP (IPv4 or /64-masked
  IPv6) with HMAC-SHA256, removing User-Agent, Accept-Language,
  Accept-Encoding, random_uuid inputs and Handlebars template rendering
- Downgrade EC ID generation logs to trace level since client IP and EC
  IDs are sensitive data
- Remove unused counter_store and opid_store config fields and KV store
  declarations (vestigial from template-based generation)
- Remove handlebars dependency

Breaking changes: wire field synthetic_fresh → ec_fresh, response headers
X-Synthetic-ID → X-TS-EC, cookie synthetic_id → ts-ec, query param
synthetic_id → ts-ec, config section [synthetic] → [edge_cookie].

Closes #462
The EC rename commit (984ba2b) accidentally re-introduced the
reject_placeholder_secrets() call and InsecureDefault tests that were
intentionally removed in 4c29dbf. Replace with log::warn() for
placeholder detection and restore the simple smoke test.
…igration

- Add ec/ module with EcContext lifecycle, generation, cookies, and consent
- Compute cookie domain from publisher.domain, move EC cookie helpers
- Fix auction consent gating, restore cookie_domain for non-EC cookies
- Add integration proxy revocation, refactor EC parsing, clean up ec_hash
- Remove fresh_id and ec_fresh per EC spec §12.1
- Migrate [edge_cookie] config to [ec] per spec §14
Implement Story 3 (#536): KV-backed identity graph with compare-and-swap
concurrency, partner ID upserts, tombstone writes for consent withdrawal,
and revive semantics. Includes schema types, metadata, 300s last-seen
debounce, and comprehensive unit tests.

Also incorporates earlier foundation work: EC module restructure, config
migration from [edge_cookie] to [ec], cookie domain computation, consent
gating fixes, and integration proxy revocation support.
Implement Story 4 (#537): partner KV store with API key hashing,
POST /admin/partners/register with basic-auth protection, strict
field validation (ID format, URL allowlists, domain normalization),
and pull-sync config validation. Includes index-based API key lookup
and comprehensive unit tests.
Implement Story 5 (#538): centralize EC cookie set/delete and KV
tombstone writes in finalize_response(), replacing inline mutation
scattered across publisher and proxy handlers. Adds consent-withdrawal
cleanup, EC header propagation on proxy requests, and docs formatting.
Implement Story 8 (#541): POST /api/v1/sync with Bearer API key auth,
per-partner rate limiting, batch size cap, per-mapping validation and
rejection reasons, 200/207 response semantics, tolerant Bearer parsing,
and KV-abort on store unavailability.
Implement Story 9 (#542): server-to-server pull sync that runs after
send_to_client() on organic traffic only. Refactors the Fastly adapter
entrypoint from #[fastly::main] to explicit Request::from_client() +
send_to_client() to enable post-send background work.

Pull sync enumerates pull-enabled partners, checks staleness against
pull_sync_ttl_sec, validates URL hosts against the partner allowlist,
enforces hourly rate limits, and dispatches concurrent outbound GETs
with Bearer auth. Responses with uid:null or 404 are no-ops; valid
UIDs are upserted into the identity graph.

Includes EC ID format validation to prevent dispatch on spoofed values,
partner list_registered() for KV store enumeration, and configurable
pull_sync_concurrency (default 3).
Implement Story 11 (#544): Viceroy-driven E2E tests covering full EC
lifecycle (generation, pixel sync, identify, batch sync, consent
withdrawal, auth rejection). Adds EC test helpers with manual cookie
tracking, minimal origin server with graceful shutdown, and required
KV store fixtures. Fixes integration build env vars.
Consolidate is_valid_ec_hash and current_timestamp into single canonical
definitions to eliminate copy-paste drift across the ec/ module tree. Fix
serialization error variants in admin and batch_sync to use Ec instead of
Configuration. Add scaling and design-decision documentation for partner
store enumeration, rate limiter burstiness, and plaintext pull token storage.
Use test constructors consistently in identify and finalize tests.
- Rename ssc_hash → ec_hash in batch sync wire format (§9.3)
- Strip x-ts-* prefix headers in copy_custom_headers (§15)
- Strip dynamic x-ts-<partner_id> headers in clear_ec_on_response (§5.2)
- Add PartnerNotFound and PartnerAuthFailed error variants (§16)
- Rename Ec error variant → EdgeCookie (§16)
- Validate EC IDs at read time, discard malformed values (§4.2)
- Add rotating hourly offset for pull sync partner dispatch (§10.3)
- Add _pull_enabled secondary index for O(1+N) pull sync reads (§13.1)
…nd cleanup

- Add body size limit (64 KiB) to partner registration
- Validate partner UID length (max 512 bytes) in batch sync and sync pixel
- Replace linear scan with binary search in encode_eids_header
- Use constant-time comparison inline in partner lookup, remove unused verify_api_key
- Remove unused PartnerAuthFailed error variant, fix PartnerNotFound → 404
- Add Access-Control-Max-Age CORS header to identify endpoint
- Tighten consent-denied integration test to expect only 403
- Add stability doc-comment to normalize_ip
- Log warning instead of silent fallback on SystemTime failure
…ror variants

Resolve integration issues from rebasing onto feature/ssc-update:
- Restore prepare_runtime() and validate_cookie_domain() lost in conflict resolution
- Add InsecureDefault error variant and wire reject_placeholder_secrets() into get_settings()
- Add sha2/subtle imports for constant-time auth comparison
- Fix error match arms (Ec → EdgeCookie, remove nonexistent PartnerAuthFailed)
- Fix orchestrator error handling to use send_to_client() pattern
- Remove dead cookie helpers superseded by ec/cookies module
Subresource requests (fonts, images, CSS) may omit the Sec-GPC header,
causing the server to incorrectly generate ts-ec cookies when the user
has opted out via Global Privacy Control. Gate generate_if_needed() on
the request Accept header containing text/html so only navigations
trigger EC identity creation.
Move admin route matching and basic-auth coverage to /_ts/admin for a hard cutover, and align tests and docs so operational guidance matches runtime behavior.
Addresses issue #612 - spec now correctly documents that the full EC ID
({64-hex}.{6-alnum}) is used as the KV store key, not just the 64-char
hash prefix.

Changes:
- Updated §4.1: ec_hash() now documented as for logging/debugging only
- Updated §7.2: KV key description changed from '64-character hex hash'
  to 'Full EC ID in {64-char hex}.{6-char alphanumeric} format'
- Updated §7.3: All KvIdentityGraph method parameters renamed from
  ec_hash to ec_id with proper documentation
- Updated §9.3: Batch sync request field renamed from ec_hash to ec_id
- Updated §9.4: Validation and error reasons updated (invalid_ec_hash
  → invalid_ec_id, ec_hash_not_found → ec_id_not_found)
- Updated §10.4: Pull sync URL parameter changed from ec_hash to ec_id
- Updated consent pipeline integration throughout to use full EC ID
- Updated all rate limiting descriptions (per EC ID, not per hash)

Rationale: The random suffix provides uniqueness for users behind the
same NAT/proxy infrastructure who would otherwise share identical
IP-derived hash prefixes.
Extends EC KV schema for cross-property identity resolution:

- Add asn field to GeoInfo (from Fastly geo.as_number())
- Add asn and dma fields to KvGeo for network identification
- Add KvDomainVisit and KvPubProperties for consortium-level domain tracking
- Add pub_properties field to KvEntry with 50-domain cap
- Track publisher domain visits in KvEntry::new() and update_last_seen()
- Respect existing 300s debounce for organic requests only

All new fields use Option types or serde(default) for backward compatibility.
Existing v1 entries continue to deserialize without error.
Implements cluster size evaluation to distinguish individual users from
shared networks (VPNs, corporate offices):

- Add KvNetwork struct with cluster_size and last_evaluated timestamp
- Add network field to KvEntry with TTL-gated cluster rechecks
- Add cluster_size to KvMetadata and IdentifyResponse
- Implement count_hash_prefix_keys() to list keys with common prefix
- Implement evaluate_cluster() on KvIdentityGraph (one-page, 100-key limit)
- Call cluster evaluation in handle_identify endpoint
- Return cluster_size in JSON body and x-ts-cluster-size header
- Add cluster_trust_threshold (default 10) and cluster_recheck_secs (default 3600) config

Cluster evaluation uses best-effort semantics: size unknown if list exceeds
100 keys. Cache hit avoids re-evaluation within recheck interval.
Derives coarse browser signals from TLS/H2/UA on every request to gate
EC identity operations. Unrecognized clients (known_browser != true) are
proxied normally but leave no trace in the identity graph.

- Add KvDevice struct (is_mobile, ja4_class, platform_class, h2_fp_hash,
  known_browser) and device field on KvEntry, written once on creation
- Add ec/device.rs with DeviceSignals::derive(), UA parsing, JA4 Section 1
  extraction, H2 fingerprint hashing, known browser allowlist (Chrome/
  Safari/Firefox)
- Add is_mobile and known_browser to KvMetadata for fast propagation checks
- Wire DeviceSignals through EcContext to KvEntry creation path
- Add bot gate in Fastly adapter: suppress KV graph, ec_finalize_response,
  and pull sync when known_browser != Some(true)
…bot gate

Document all KV schema additions implemented in the preceding commits:
geo extensions (asn/dma), publisher domain tracking, network cluster
evaluation, device signal derivation, and the bot gate architecture.

- Add §7A Device Signals and Bot Gate (signal derivation, allowlist,
  bot gate behavior matrix, KvDevice write policy, privacy rationale)
- Update §7.2 with full KvEntry schema including KvGeo, KvPubProperties,
  KvDomainVisit, KvDevice, KvNetwork, and extended KvMetadata
- Update §2 architecture diagram with Phase 0 bot gate step
- Update §4.3 EcContext with device_signals field
- Update §5.4 lifecycle with Phase 0 and ec_finalize gating
- Update §11 /identify with cluster_size in JSON and x-ts-cluster-size header
- Update §14 config with cluster_trust_threshold and cluster_recheck_secs
- Update §17.1 main.rs pseudocode with full bot gate wiring
The known_browser fingerprint allowlist (3 entries) was too narrow and
blocked legitimate browsers whose JA4/H2 combinations were not listed.

Replace the gate with DeviceSignals::looks_like_browser() which checks
for signal presence: ja4_class.is_some() && platform_class.is_some().
Real browsers always produce both; raw HTTP clients typically lack one
or both. known_browser is still computed and stored on KvDevice for
analytics but no longer gates identity operations.
Entries created before pub_properties was added have the field as None.
The previous code only updated pub_properties when it already existed
(if let Some), so returning users on pre-existing entries never got
domain tracking.

Now when pub_properties is None, update_last_seen initializes it with
the current domain as origin_domain and first seen_domains entry. This
is a one-time backfill per entry — subsequent visits take the existing
update path.
…t consistency

Address PR review findings across 11 files:
- Elevate current_timestamp() fallback log from warn to error
- Cache known_browser_h2_hashes() with OnceLock instead of recomputing per request
- Unify MAX_UID_LENGTH to 512 across sync_pixel, batch_sync, and pull_sync
- Fix identify json_response to use EdgeCookie error variant instead of Configuration
- Remove unused _geo_info param from ec_finalize_response
- Remove dead uppercase X- prefix check in copy_custom_headers
- Add navigation-request Accept-header fallback debug logging
- Document RefCell intent, pull-enabled index race condition, normalize_ec_id_for_kv
  defense-in-depth, and consent withdrawal test jurisdiction dependency
…odule

Resolve compilation errors from rebasing onto main's PlatformKvStore
abstraction layer:
- Add asn field to platform GeoInfo and remove duplicate struct from geo.rs
- Merge geo_from_fastly helper to include ASN extraction
- Pass kv_store: None in EcContext consent pipeline (EC manages own KV)
- Update finalize.rs consent deletion to use Fastly KVStore directly
- Fix TrustedServerError::Ec -> EdgeCookie in error test
- Allow deprecated GeoInfo::from_request in EC and adapter entry points
- Update route_tests for RouteOutcome struct and EC architecture changes
- Restore runtime_services construction in adapter entry point
@ChristianPavilonis ChristianPavilonis force-pushed the feature/edge-cookies-final branch from de0525d to 0940c34 Compare April 7, 2026 19:01
@ChristianPavilonis ChristianPavilonis requested review from aram356 and prk-Jr and removed request for aram356 April 7, 2026 19:04
Add log_id() helper that returns only the first 8 chars of an EC ID
for log messages. Addresses CodeQL 'cleartext logging of sensitive
information' findings on kv.rs:611 and kv.rs:748, and applies the
same treatment to all ec_id log statements in the file for consistency.
…and code quality

Address review findings across the EC identity system:

- Extract shared log_id() and truncate EC IDs/client IPs in all 26 log
  sites to satisfy CodeQL cleartext-logging rules (P0)
- Enforce HTTPS on sync pixel return URLs, restrict EC ID validation to
  lowercase hex, add cookie debug_assert and API key hash collision guard (P1)
- Add 2MB body size limit on batch sync, minimum passphrase length,
  whitespace-only pull token rejection (P1/P2)
- Fix withdrawal_ec_ids double computation, pull_sync double ec_value()
  call, and downgrade CAS conflict on update_last_seen to debug (P1/P2)
- Deduplicate MAX_UID_LENGTH and normalize_ec_id_for_kv to shared
  locations, unify ec::consent wrapper usage (P1/P2)
- Fix test extract_ec_cookie short-circuit bug, MinimalOrigin HTTP
  response whitespace, pixel sync Location header assertion (P0/P1)
- Switch ad-proxy base64 to URL_SAFE_NO_PAD for path-safe encoding (P1)
- Cleanup: remove stale #[allow(unused)], fix typos, add Debug derives,
  add mobile signal named constants, redact api_key in Debug output
match futures::executor::block_on(store.put_bytes_with_ttl(ec_id, body, ttl)) {
Ok(()) => {
log::info!("Saved consent to KV store for '{ec_id}' (fp={fp}, ttl={max_age_days}d)");
log::info!(
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed — clean separation of concerns, strong input validation, and solid concurrency model. A few cleartext logging issues and a docs inconsistency need addressing before merge.

Blocking

🔧 wrench

  • Cleartext EC ID logging: Full EC IDs are logged at warn!/error!/trace! levels in 5 locations across ec/mod.rs (lines 128, 211, 296), sync_pixel.rs (line 91), and pull_sync.rs (line 91). The log_id() truncation helper was introduced in this PR for exactly this purpose but is not used consistently. See inline comments for fixes.
  • Stale env var in docs: TRUSTED_SERVER__EDGE_COOKIE__SECRET_KEY still appears in configuration.md (lines 40, 956) and error-reference.md (line 154). The PR renamed the TOML section from [edge_cookie] to [ec] and updated some env var references but missed these. Users following the docs will set a variable that is silently ignored, resulting in a startup failure from the placeholder passphrase. Should be TRUSTED_SERVER__EC__PASSPHRASE.

❓ question

  • HttpOnly omitted from EC cookie (ec/cookies.rs:11): Intentionally omitted for a hypothetical future JS use case. Is there a concrete plan? The identify endpoint already exposes the EC ID. If not, HttpOnly would be cheap XSS defense-in-depth.

Non-blocking

🤔 thinking

  • Uppercase EC ID rejection in batch sync (batch_sync.rs:209): is_valid_ec_id rejects uppercase hex, but batch sync validates before normalizing (line 225). Partners submitting uppercase EC IDs get invalid_ec_id instead of normalization.

♻️ refactor

  • _pull_enabled index lacks CAS (partner.rs:564): Read-modify-write without generation markers. Concurrent partner registrations can overwrite each other's index updates. Self-healing via fallback, but inconsistent with the CAS discipline used elsewhere.

🌱 seedling

  • Integration tests don't verify identify response shape: FullLifecycle and ConcurrentPartnerSyncs only assert uids.<partner>. The ec, consent, degraded, eids, and cluster_size fields from the API reference are never checked.
  • Pixel sync failure paths untested end-to-end: ts_reason=no_ec, no_consent, domain mismatch redirects are unit-tested but not covered in integration tests.

📝 note

  • trusted-server.toml ships banned placeholder: passphrase = "trusted-server" is rejected by reject_placeholder_secrets. Works in CI via env override, but new contributors will hit a confusing startup error. A TOML comment would help.

⛏ nitpick

  • Eid/Uid missing Deserialize: openrtb.rs types derive Serialize only. If the auction ever needs to parse EIDs from provider responses, Deserialize will be needed.

CI Status

  • cargo fmt: PASS
  • clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • integration tests: FAIL
  • CodeQL: FAIL (likely related to cleartext logging findings above)

.filter(|v| is_valid_ec_id(v))
.or_else(|| parsed.cookie_ec.filter(|v| is_valid_ec_id(v)));
if let Some(ref id) = ec_id {
log::trace!("Existing EC ID found: {id}");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Full EC ID logged at trace! level.

The log_id() helper was introduced in this PR to prevent cleartext logging of user identifiers, but this call site still logs the full value. Same issue on line 211.

Fix:

log::trace!("Existing EC ID found: {}…", super::log_id(id));


if let Err(err) = graph.create_or_revive(ec_value, &entry) {
log::error!(
"Failed to create or revive EC entry for id '{}' after generation: {err:?}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Full EC ID logged at error! level (production log sink).

ec_value is the full {64hex}.{6alnum} identifier. This is at error! level which is always shipped to production logs.

Fix:

log::error!(
    "Failed to create or revive EC entry for id '{}…' after generation: {err:?}",
    log_id(ec_value),
);

log::warn!(
"Pixel sync write failed for partner '{}' and ec_id '{}': {err:?}",
partner.id,
cookie_ec_id,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Full EC ID logged at warn! level.

cookie_ec_id is the raw EC identifier. Should use log_id() for consistency with the rest of the module.

Fix:

log::warn!(
    "Pixel sync write failed for partner '{}' and ec_id '{}…': {err:?}",
    partner.id,
    super::log_id(&cookie_ec_id),
);

Err(err) => {
log::warn!(
"Pull sync: failed to read identity graph for '{}': {err:?}",
context.ec_id()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Full EC ID logged at warn! level.

Should use log_id() like other modules in the EC subsystem.

Fix:

log::warn!(
    "Pull sync: failed to read identity graph for '{}…': {err:?}",
    super::log_id(context.ec_id())
);

//! - `Secure` restricts to HTTPS
//! - `SameSite=Lax` provides CSRF protection while allowing top-level navigations
//! - `Max-Age` of 1 year (or 0 to expire)
//! - No `HttpOnly` — intentionally omitted so the cookie is available to
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question — Is there a concrete near-term use case for client-side JS reading the EC cookie via document.cookie?

The /_ts/api/v1/identify endpoint already provides the EC ID in the response body and x-ts-ec header. If no JS use case is planned, adding HttpOnly would be a low-cost XSS defense-in-depth improvement.

let mut errors = Vec::new();

for (idx, mapping) in mappings.iter().enumerate() {
if !is_valid_ec_id(&mapping.ec_id) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinkingis_valid_ec_id rejects uppercase hex, but validation runs before normalize_ec_id_for_kv (line 225).

If a partner submits an EC ID with uppercase hex (e.g. AABB...), it is rejected as invalid_ec_id even though normalization on line 225 would have fixed it. Meanwhile is_valid_ec_hash (used elsewhere) accepts uppercase.

Consider either normalizing before validation, or explicitly documenting in the API reference that partners MUST send lowercase EC IDs.

/// will be re-added on the next `upsert()` for that partner, and
/// `pull_enabled_partners()` falls back to `list_registered()` when the
/// index is missing or stale.
fn update_pull_enabled_index(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactorupdate_pull_enabled_index performs read-modify-write without CAS.

The comment documents the race condition and fallback, which is appreciated. For a future iteration, consider using CAS (generation markers) here too — the same pattern used in kv.rs for identity graph writes would make this self-consistent.

Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed with clean separation of concerns, strong concurrency discipline, and solid security choices throughout. Four new blocking findings join the existing ones — three are input-validation gaps and one is a PII leak in the pull sync URL builder that should be resolved before merge.

Blocking

🔧 wrench

  • PII leak: client IP forwarded to partnerspull_sync.rs:273: build_pull_request_url appends the raw user IP as the ip query parameter on every outbound pull-sync request, exposing it in partner access logs. Contradicts the privacy-preserving design intent and is likely a GDPR Article 5 concern. Remove the ip parameter or gate it behind an explicit per-partner allow_ip_forwarding config flag.
  • Pull sync response body unboundedpull_sync.rs:325: take_body_bytes() with no size cap before JSON parsing. A malicious partner can exhaust WASM memory. Batch sync and admin endpoints both have MAX_BODY_SIZE guards; this path needs one too (64 KiB is generous for a {"uid":"..."} response).
  • Whitespace-only UIDs bypass validationbatch_sync.rs:217 and sync_pixel.rs:41: is_empty() passes " " and "\t", which get stored as garbage EID values in the KV graph. Replace with trim().is_empty() at both sites.
  • rand::thread_rng() WASM compatibility unverifiedgeneration.rs:52: requires getrandom with the wasi feature on wasm32-wasip1. Native cargo test passing does not prove the WASM build is safe; integration tests are already failing so this may not have been caught. Switch to OsRng or add getrandom = { version = "0.2", features = ["wasi"] } explicitly.

Non-blocking

🤔 thinking

  • process_mappings UpsertResult branches untestedbatch_sync.rs:232: NotFound, ConsentWithdrawn, and Stale have zero unit test coverage. Stale is documented as "counted as accepted" with no regression test.
  • Shared error messages make pull sync validation tests non-isolatingpartner.rs:152,165: both missing-URL and missing-token conditions return the identical error string, so the tests asserting on substrings pass even if the wrong branch fires. Use distinct messages per condition.

♻️ refactor

  • ec_consent_granted has no testsconsent.rs:20: the entry-point gate for all EC creation has no #[cfg(test)] section. Add smoke tests for granted and denied paths.
  • KV tombstone path never exercised in finalize testsfinalize.rs:149: all finalize tests pass kv: None, so the tombstone write and the cookie-EC ≠ active-EC case in withdrawal_ec_ids are untested.
  • Missing HMAC prefix stability testgeneration.rs:228: no test asserts that the same IP + passphrase always produces the same 64-char hash prefix, which is the core deduplication guarantee for the KV graph.
  • normalize_ec_id duplicated in integration testsintegration-tests/tests/common/ec.rs:374: reimplements normalize_ec_id_for_kv from core. Re-export and use the canonical implementation.

⛏ nitpick

  • Use saturating_sub for consistencykv.rs:605: the subtraction is safe due to the guard above but inconsistent with saturating_sub used throughout the rest of the module.
  • log_id should encapsulate the suffixmod.rs:51: every call site manually appends "…" in the format string; move it inside the function.

Praises 👍

  • CAS-based optimistic concurrency (kv.rs): bounded retries, graceful ItemPreconditionFailed handling, and MAX_CAS_RETRIES preventing infinite loops — textbook correct for a single-writer KV model.
  • Constant-time API key comparison (partner.rs): subtle::ConstantTimeEq for timing attack prevention on key lookups. Many implementations miss this entirely.
  • KvMetadata fast-path consent check (kv_types.rs): mirroring ok/country/cluster_size in metadata to avoid streaming the full KV body is an excellent performance optimisation with the right constraint test.
  • evaluate_known_browser with OnceLock-cached hash table (device.rs): pre-hashing fingerprints once and caching via OnceLock is the right WASM-compatible lazy-init pattern.
  • HMAC stability explicitly documented (generation.rs:30-33): noting that the output format is a "stable contract" that would invalidate all existing identities if changed is exactly the kind of correctness annotation that prevents future breakage.

CI Status

  • cargo fmt: PASS
  • cargo clippy: PASS
  • cargo test: PASS
  • vitest: PASS
  • integration tests: FAIL
  • CodeQL: FAIL (cleartext logging — covered in prior review)

base_url
.query_pairs_mut()
.append_pair("ec_id", ec_id)
.append_pair("ip", client_ip);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — PII leak: raw client IP forwarded to external partner servers.

Every outbound pull-sync request appends the user's raw IP address as a query parameter. Even over HTTPS, the IP appears in partner server access logs and any intermediary CDN logs. This directly contradicts the privacy-preserving design intent of the EC system — normalize_ip in generation.rs already truncates IPv6 to /64 precisely to reduce per-user tracking.

This is likely a GDPR Article 5 concern (data minimisation). The PR's own follow-up note acknowledges this.

Fix: Remove the ip query parameter. If partners need geo-resolution, pass the pre-stored country from KvGeo (a non-personal aggregate). If forwarding is intentional for specific partners, gate it behind an explicit allow_ip_forwarding: bool field in PartnerRecord so it is an opt-in decision documented in the partner contract.

// Remove this line:
.append_pair("ip", client_ip);

return None;
}

let body = response.take_body_bytes();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Pull sync partner response body has no size limit before parsing.

take_body_bytes() fully buffers the response into a Vec<u8> before JSON parsing. A misbehaving or malicious pull-sync partner could return a multi-megabyte body and exhaust WASM memory. Both the batch sync and admin endpoints apply explicit MAX_BODY_SIZE guards — this path is missing one.

The expected response schema is { "uid": "<string>" }, so a generous 64 KiB cap is more than sufficient.

Fix:

const MAX_PULL_RESPONSE_BYTES: usize = 64 * 1024;
let body = response.take_body_bytes();
if body.len() > MAX_PULL_RESPONSE_BYTES {
    log::warn!(
        "Pull sync: partner '{}…' returned oversized response ({} bytes), rejecting",
        log_id(partner_id),
        body.len()
    );
    return None;
}
let payload = match serde_json::from_slice::<PullSyncResponse>(&body) {

continue;
}

if mapping.partner_uid.is_empty() || mapping.partner_uid.len() > MAX_UID_LENGTH {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Whitespace-only UIDs bypass validation.

mapping.partner_uid.is_empty() returns false for " " or "\t". A partner could write a whitespace-only UID that passes this check, gets stored in KV, and later surfaces as a garbage EID value in auction bid requests (resolve_partner_ids also uses is_empty()).

The same gap exists in sync_pixel.rs — the pixel sync path only checks MAX_UID_LENGTH with no emptiness check at all (line ~41).

Fix: Replace is_empty() with trim().is_empty() at both call sites:

// batch_sync.rs
if mapping.partner_uid.trim().is_empty() || mapping.partner_uid.len() > MAX_UID_LENGTH {

// sync_pixel.rs (add alongside the length check)
if query.uid.trim().is_empty() {
    return pixel_redirect(&partner, TsReason::InvalidUid);
}

) -> Result<Response, Report<TrustedServerError>> {
let query = SyncQuery::parse(req)?;

if query.uid.len() > MAX_UID_LENGTH {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Pixel sync UID has no emptiness check.

This code path validates uid.len() > MAX_UID_LENGTH but never checks whether the UID is empty or whitespace-only. An empty uid query parameter would pass through, call upsert_partner_id with an empty string, and store an invalid entry in the KV graph.

See also the companion finding in batch_sync.rs:217.

Fix:

if query.uid.trim().is_empty() {
    return pixel_redirect(&partner, TsReason::InvalidUid);
}
if query.uid.len() > MAX_UID_LENGTH {
    // existing check
}


/// Generates a random alphanumeric string of the specified length.
fn generate_random_suffix(length: usize) -> String {
let mut rng = rand::thread_rng();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchrand::thread_rng() WASM compatibility is unverified.

rand::thread_rng() on wasm32-wasip1 requires the getrandom crate with the wasi feature enabled. rand 0.8 with default features uses getrandom internally, but the WASI feature must be explicitly declared. The native cargo test passing does not prove the WASM build is safe — the CI description shows cargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1 as a gate, but integration tests are currently failing so this may not have been verified post-merge.

Fix: Switch to rand::rngs::OsRng (which explicitly uses getrandom) or add an explicit getrandom dependency with features = ["wasi"] to ensure the entropy source compiles correctly on the target:

# Cargo.toml
[dependencies]
getrandom = { version = "0.2", features = ["wasi"] }

Alternatively, verify with a CI step that runs cargo build --target wasm32-wasip1 for trusted-server-core.

}
}

fn withdrawal_ec_ids(ec_context: &EcContext) -> HashSet<String> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactorwithdrawal_ec_ids logic is untested, and the KV tombstone path is never exercised by unit tests.

All finalize tests call ec_finalize_response(…, None, …) — passing None for the KV store — so the tombstone write path is never triggered. The withdrawal_ec_ids function also handles the case where the cookie EC differs from the active EC (both should be tombstoned), which is not directly tested.

Suggested tests for withdrawal_ec_ids (make it pub(crate) temporarily):

  1. Only cookie present → returns {cookie_ec}
  2. Cookie and active EC equal → returns {ec} (deduplicated)
  3. Cookie and active EC differ → returns both
  4. Malformed cookie value → filtered out

And at least one finalize test should pass a real MockKv to verify the tombstone write is invoked.

}

#[test]
fn generate_produces_valid_format() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactor — Missing test for HMAC prefix stability (the core deduplication guarantee).

generate_produces_valid_format verifies the output format but not the determinism property: two calls with the same IP and passphrase must produce the same 64-char hash prefix. This is the foundation of KV graph deduplication — if the prefix ever becomes non-deterministic (e.g. a change to normalize_ip or the HMAC input), all existing identities are silently invalidated.

Fix:

#[test]
fn generate_ec_id_same_ip_produces_consistent_hash_prefix() {
    let settings = create_test_settings();
    let id1 = generate_ec_id(&settings, "192.168.1.1").expect("should generate");
    let id2 = generate_ec_id(&settings, "192.168.1.1").expect("should generate");
    assert_eq!(
        ec_hash(&id1),
        ec_hash(&id2),
        "same IP must produce same HMAC prefix"
    );
    assert_ne!(id1, id2, "random suffix should differ between calls");
}

/// Normalizes an EC ID for KV key usage.
///
/// Lowercases the 64-char hex hash prefix and preserves the 6-char suffix.
pub fn normalize_ec_id(ec_id: &str) -> String {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactornormalize_ec_id reimplements normalize_ec_id_for_kv from core.

This helper duplicates the logic from crates/trusted-server-core/src/ec/generation.rs. If the normalization format ever changes in the core crate, the integration tests would silently test against a stale format and could give false positives.

Fix: Re-export normalize_ec_id_for_kv as pub from the core crate (or as pub(crate) accessible to integration tests via the test-support feature) and call it directly here. Remove the local copy.

}

// Debounce: skip if the stored value is recent enough.
if timestamp - entry.last_seen < LAST_SEEN_DEBOUNCE_SECS {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick — Use saturating_sub for consistency with the rest of the module.

The subtraction timestamp - entry.last_seen is safe here because the guard at line 595 guarantees timestamp > entry.last_seen. However, this is non-obvious and inconsistent with saturating_sub used throughout the rest of kv.rs (e.g. lines 787, 792). A future change to the guard condition could silently introduce a debug-mode panic.

// Before
if timestamp - entry.last_seen < LAST_SEEN_DEBOUNCE_SECS {
    log::debug!("…", timestamp - entry.last_seen,);
// After
let elapsed = timestamp.saturating_sub(entry.last_seen);
if elapsed < LAST_SEEN_DEBOUNCE_SECS {
    log::debug!("…", elapsed,);

/// writing the full user identifier to logs (satisfies the `CodeQL`
/// "cleartext logging of sensitive information" rule).
#[must_use]
pub fn log_id(ec_id: &str) -> &str {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicklog_id should encapsulate the trailing suffix.

Every call site manually appends "…" in the format string (e.g. "'{}…'", log_id(id)). The ellipsis is an implementation detail of the redaction function and should live inside it. If the truncation length ever changes, every format string must be updated.

Fix:

#[must_use]
pub fn log_id(ec_id: &str) -> String {
    let prefix = ec_id.get(..8).unwrap_or(ec_id);
    format!("{prefix}…")
}

Then update callers to remove the manual "…" suffix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants