Skip to content

feat(cwop): v0.2 persistence + history() backtest replay#83

Merged
helloiamvu merged 4 commits into
mainfrom
sprint0/vu-cwop-v02
Jun 21, 2026
Merged

feat(cwop): v0.2 persistence + history() backtest replay#83
helloiamvu merged 4 commits into
mainfrom
sprint0/vu-cwop-v02

Conversation

@helloiamvu

Copy link
Copy Markdown
Member

Summary

CWOP v0.2: a standalone persistence layer + a parity-safe backtest-replay path so ML strategies can train on Personal Weather Station history — without ever wiring CWOP into the parity-critical Kalshi NHIGH/NLOW settlement join.

CWOP stays on its own island. The four parity-frozen files are untouched (verified by git diff main): merge/observations.py, research.py, live/_sources.py, schema.observation.v1. There is no include_cwop on research() — that path aggregates max/min(temp_f) source-blind, so a single hot-rooftop or indoor PWS would corrupt a settlement target. CWOP reaches models only through its own history(), a DataFrame the strategy joins itself.

New public surface (mostlyright.weather.cwop)

  • history(station, from_date, to_date, *, qc_status=None) -> schema.cwop.v1 DataFrame — backtest replay from the persisted cache (source="cwop.cache"). Inclusive date range; optional QC filter (e.g. qc_status="clean"). Raises NoCWOPDataError on an empty range — never an empty frame.
  • persist_observations(observations) -> int — write live observations to the monthly parquet cache (the collection seam for snapshot() / stream() output).
  • snapshot(..., persist=True) — collect and persist in one call (side effect; the returned live frame is unchanged, still source="cwop.live").

Persistence (weather/cwop/_cache.py)

  • Layout: $HOME/.mostlyright/cache/cwop/{station}/{year}/{month}.parquet (honors MOSTLYRIGHT_CACHE_DIR).
  • Standalone island — does NOT import the parity-coupled weather/cache.py. filelock-guarded read-modify-write merge under a single per-partition lock (no lost updates), dedup by (station_id, observed_at) first-seen-wins, atomic .tmp + os.replace.
  • No current-month skip. CWOP is ephemeral (live-only APRS-IS, no REST backfill), so the current month is exactly what a live collector must retain — the opposite of the re-fetchable AWC/IEM/GHCNh observation cache.
  • CWOP station ids (CW0875, ham callsigns) aren't 4-letter ICAO, so the cache uses its own _CWOP_STATION_RE path validator + assert_path_under backstop.
  • Timestamps normalized to tz-aware UTC at the write chokepoint, so dedup is instant-stable and the on-disk column is uniformly timestamp[us, UTC].

Schema

schema.cwop.v1 now accepts the source union {"cwop.live", "cwop.cache"} via _registered_sources (singular _registered_source dropped so legitimate cache reads don't pollute the source-drift audit trail). build_cwop_dataframe gained a source param. The parity-frozen schema.observation.v1 enum is untouched — no PWS member.

Review

Architecture self-review + two independent adversarial review rounds (codex was rate-limited mid-run, so an independent reviewer substituted). Round 1 found 1 BLOCKER (mixed naive/aware observed_at defeating dedup + tz-stripping storage), 2 MEDIUM (history range-guard stricter than the window; false source_drift_allowed audit on cache reads), 1 LOW (null qc_statusSchemaValidationError). All fixed; round 2 verified the fixes and confirmed no new issues and the firewall intact. The two non-blocking follow-ups (qc-filter/materialized-value consistency, stale docstrings) are also resolved.

Tests

uv run pytest -m "not live"3240 passed, 414 skipped (baseline 3193 + 47 new CWOP persistence/history tests). New-module line coverage (sysmon lane): 93% _cache / 100% _history. Lint + format clean.

Parity safety

git diff main shows zero changes to merge/observations.py, research.py, live/_sources.py, and the canonical core/schemas/. No "cwop" key in any merge/live/research file. CWOP persistence is a physically separate cache tree and a separate module that never imports the parity cache.

TS Parity

N/A for this phase (Python-only), parity ticket deferred. The v0.1 CWOP adapter already ships Python-first with a TS parity ticket for the live path (APRS socket → Worker/ReadableStream). v0.2 persistence is a local parquet cache — TS has no equivalent local-disk story in-browser, and the history() query maps to whatever local/edge storage the TS SDK adopts later. No TS-visible public-API contract changes here that aren't already gated behind the existing CWOP TS parity ticket. No browser/bundle constraints introduced (pure pyarrow/filelock, server-side).

python_only: true — CWOP v0.2 persistence + history() is a Python-only local-cache feature behind the CWOP firewall; no cross-SDK contract, tracked by the existing CWOP TS parity ticket.

🤖 Generated with Claude Code

Adds a standalone CWOP persistence layer and a parity-safe access path so ML
strategies can replay personal-weather-station history for model training —
without ever wiring CWOP into the parity-critical Kalshi settlement join.

New surface (weather.cwop):
- history(station, from_date, to_date, *, qc_status=None) -> schema.cwop.v1 DataFrame
  (source="cwop.cache"); the backtest-replay entry point. Raises NoCWOPDataError
  on an empty range — never an empty frame.
- persist_observations(observations) -> int — write live observations to the
  monthly parquet cache (the collection seam for snapshot()/stream() output).
- snapshot(..., persist=True) — collect AND persist in one call (side effect;
  the returned live frame is unchanged, still source="cwop.live").

Persistence (weather/cwop/_cache.py):
- $HOME/.mostlyright/cache/cwop/{station}/{year}/{month}.parquet
  (honors MOSTLYRIGHT_CACHE_DIR).
- Standalone island: does NOT import the parity-coupled weather/cache.py.
- filelock-guarded read-modify-write merge, dedup by (station_id, observed_at)
  first-seen-wins, atomic .tmp + os.replace.
- NO current-month skip — CWOP is ephemeral (no REST backfill), so the current
  month is exactly what a live collector must retain.
- Own _CWOP_STATION_RE path validator (CWOP ids are not 4-letter ICAO).

Schema: schema.cwop.v1 now accepts both "cwop.live" and "cwop.cache" via
_registered_sources; build_cwop_dataframe takes a source param.

Parity firewall unchanged: merge/observations.py, research.py, live/_sources.py,
and schema.observation.v1 are all untouched (verified by diff). No include_cwop
on research() — that path aggregates max/min source-blind and would corrupt
NHIGH/NLOW targets.

Tests: +40 (30 cache, 10 history); full non-live suite 3233 passed. New-module
line coverage 94% (_cache) / 96% (_history) via the sysmon lane.
Independent adversarial review of the v0.2 persistence/history feature.

BLOCKER — mixed naive/aware observed_at defeated dedup and tz-stripped the
stored timestamp column. write_cwop_cache now normalizes observed_at/
knowledge_time to tz-aware UTC at the write chokepoint (before the dedup key is
computed), so a naive and an aware spelling of the same instant collapse to one
row and the on-disk pyarrow column is uniformly timestamp[us, UTC].

MEDIUM — history() range guard was stricter than the window it gated: it
collapsed a bare `to` date to midnight while read_cwop_window expands it to
end-of-day, spuriously rejecting valid asymmetric ranges like
(datetime 18:00, date same-day). The guard now uses the same _as_utc_datetime
normalization as the window.

MEDIUM — a legitimate cwop.cache read logged a false `source_drift_allowed`
audit event, because the validator's audit block keys off the legacy singular
_registered_source. CwopSchema now declares the union via _registered_sources
ONLY (no singular), so genuine union-source reads no longer pollute the
train/infer-mismatch audit trail. No validator change.

LOW — history() rebuilt qc_status via dict-default, which leaves an explicit
None cell as None and fails the non-nullable enum check (SchemaValidationError
instead of the documented NoCWOPDataError/DataFrame contract). Now coerces with
`or "unknown"`.

+6 regression tests. Full non-live suite 3239 passed; new-module line coverage
93% (_cache) / 100% (_history).
…e docs

- history() qc_status filter now matches on the SAME null→"unknown" coercion
  _row_to_observation applies, so a persisted row stored with a null verdict is
  reachable via qc_status="unknown" (predicate now matches the materialized
  value the caller sees). +1 regression test.
- Refresh CwopSchema docstring + the `source` ColumnSpec note for the
  {cwop.live, cwop.cache} source union (were stale after the v0.2 change).
@helloiamvu helloiamvu requested a review from Tarabcak June 21, 2026 05:03
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

Docs-required check: PASS

API-surface change includes docs updates — no reminder needed.

API-surface files changed:

packages/weather/src/mostlyright/weather/cwop/__init__.py
packages/weather/src/mostlyright/weather/cwop/_cache.py
packages/weather/src/mostlyright/weather/cwop/_history.py
packages/weather/src/mostlyright/weather/cwop/_schema.py
packages/weather/src/mostlyright/weather/cwop/_snapshot.py

Docs files changed:

CLAUDE.md
docs/cwop-adapter.md

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

Parity ticket gate: PASSED

parity-ticket-check: PR does not touch parity-trigger surface; gate skipped.

See CROSS-SDK-SYNC.md §2 for the workflow.

Stamp the partition's normalized (upper) station id onto every persisted
row, mirroring the authoritative source stamp. Persisting via a lowercase
id (cw0875) lands in the CW0875 partition but previously kept the raw
station_id, so the (station_id, observed_at) dedup key diverged and
casing variants stored as distinct rows — history() then returned
duplicates for one station. (codex P2)
@helloiamvu helloiamvu merged commit 7f489a1 into main Jun 21, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant