The public-data SDK for quants, ML pipelines, and AI agents.
mostlyright is a Python + TypeScript SDK for quants, ML engineers, and AI agents — one direct, schema-versioned interface to every public data source that matters. Today's adapters ship weather data (live METAR/ASOS, forecasts, GHCNh climate history, NWS CLI text products) and prediction-market settlements (Kalshi NHIGH/NLOW, Polymarket discovery + settlement, Kalshi + Polymarket trades). Next: SEC filings (EDGAR), equities structured data, Federal Reserve series (FRED), court filings, FDA approvals — anywhere the data is public, we ship the adapter. Local-first: no hosted backend, no API key for the public-data layer, byte-equivalent reproducibility from research to backtest, and leakage-free training pairs for ML pipelines.
# Python
pip install 'mostlyrightmd[research]'
# TypeScript / Node
pnpm add mostlyrightPython 3.11+. Node 18+. No API key required for any package below.
# Python
import mostlyright
df = mostlyright.research("KNYC", "2025-01-06", "2025-01-12")
print(df.head())
# pandas DataFrame: one row per LST settlement date
# Columns: date, station, cli_high_f, cli_low_f, obs_high_f, obs_low_f,
# obs_mean_f, obs_count, fcst_*, market_close_utc// TypeScript
import { research } from "mostlyright";
const rows = await research("KNYC", "2025-01-06", "2025-01-12");
console.log(rows[0]);
// ReadonlyArray<PairsRow>: same schema as Python, JSON-serializableFirst call writes a parquet (Python) or JSON-envelope (Node) cache to ~/.mostlyright/cache/. Subsequent calls in the same window are local-only — no network.
Full quickstart with concepts at https://mostlyright.md/docs/sdk/.
| Package | Description | Downloads | Status |
|---|---|---|---|
mostlyrightmd |
Core types, schemas, validators, the research() join, and snapshot primitives. Imports as mostlyright. |
stable | |
mostlyrightmd-weather |
Weather data fetchers — live METAR (AWC), ASOS archive (IEM), historical observations (GHCNh), and NWS climate text products (CLI). Direct public-API access. | stable | |
mostlyrightmd-markets |
Prediction-market data — Kalshi NHIGH/NLOW weather-contract resolvers, Polymarket discovery + settlement, and Kalshi + Polymarket trade history. | stable | |
mostlyrightmd-edgar |
SEC filings (10-K, 10-Q, 8-K) — direct EDGAR full-text + facts access. | n/a | planned |
mostlyrightmd-equities |
Equities structured data — fundamentals, corporate actions, normalized XBRL pulls. | n/a | planned |
mostlyrightmd-fred |
Federal Reserve economic data (FRED series, observations, releases). | n/a | planned |
mostlyrightmd-courts |
Court filings — federal (PACER) and state dockets, opinions, party data. | n/a | planned |
mostlyrightmd-fda |
FDA approvals — drug + medical-device clearance records, post-market events. | n/a | planned |
| Package | Description | Downloads | Status |
|---|---|---|---|
mostlyright |
Meta package — one import { research } from "mostlyright" for weather data + prediction-market settlements + the core join. |
stable | |
@mostlyrightmd/core |
Core types, schemas, validators, temporal-safety primitives, and the research() join. |
stable | |
@mostlyrightmd/weather |
Weather data fetchers — live METAR (AWC), ASOS archive (IEM), historical observations (GHCNh), and NWS climate text products (CLI). | stable | |
@mostlyrightmd/markets |
Prediction-market data — Kalshi NHIGH/NLOW weather-contract resolvers, Polymarket discovery + settlement, and Kalshi + Polymarket trade history. | stable | |
@mostlyrightmd/edgar |
SEC filings (10-K, 10-Q, 8-K) — direct EDGAR full-text + facts access. | n/a | planned |
@mostlyrightmd/equities |
Equities structured data — fundamentals, corporate actions, normalized XBRL pulls. | n/a | planned |
@mostlyrightmd/fred |
Federal Reserve economic data (FRED series, observations, releases). | n/a | planned |
@mostlyrightmd/courts |
Court filings — federal (PACER) and state dockets, opinions, party data. | n/a | planned |
@mostlyrightmd/fda |
FDA approvals — drug + medical-device clearance records, post-market events. | n/a | planned |
Pull joined climate + observation rows for a settlement window, compare against contract spec, and produce a deterministic settlement decision.
import mostlyright
from mostlyright.markets.catalog import kalshi_nhigh
# 1. What does the contract settle on?
contract = kalshi_nhigh.resolve("KHIGHNYC", date(2025, 1, 15))
# 2. Pull the data that decides it.
df = mostlyright.research(contract.settlement_station, "2025-01-15", "2025-01-15")
# 3. Apply the threshold and decide.research() stamps a source identity on every row. The validator catches train/infer mismatches at load time, instead of silently corrupting predictions in production.
from mostlyright.mode2 import research_by_source
from mostlyright.core import validate_dataframe
train = research_by_source("KNYC", "iem.archive", "2024-01-01", "2024-12-31")
# Validator pins the schema's expected source. SourceMismatchError fires if
# you accidentally route a Mode 1 (fused AWC+IEM+GHCNh) DataFrame through a
# Mode 2 (iem.archive only) schema.
validate_dataframe(train, schema_id="schema.observation.v1")Every response carries a stable schema.*.v1 URI and serializes to JSON-Schema-validated shapes. Drop responses into MCP tool outputs or OpenAI/Anthropic function-call returns without re-shaping.
import { research } from "mostlyright";
import { validateRows } from "@mostlyrightmd/core/validator";
const rows = await research("KNYC", "2025-01-06", "2025-01-12");
const result = validateRows(rows, "schema.observation.v1");
// result.audit_log carries the schema URI + source identity + row count
// — ready to pass through to an agent's tool-call response.Native-L2 single-pixel extraction across the entire populated world, each
instrument family carrying its own leakage-safe source identity so a model
trained on one never silently reconciles against another. A feature supplement
(cloud-mask / land-surface covariates), not a primary Tmax/Tmin settlement
source. Ships as the optional mostlyrightmd-weather[satellite] extra
(whole-file S3 reads via s3fs + h5netcdf/h5py; no hosted backend — reads
the same anonymous public buckets as the AWC/IEM/NWP calls).
| Source identity | Instrument | Coverage | Access |
|---|---|---|---|
noaa_goes |
GOES-East/West ABI | Americas + eastern Pacific | anonymous |
jma_himawari |
Himawari AHI | Asia-Pacific | anonymous |
noaa_viirs |
VIIRS (JPSS) polar swath | global incl. poles | anonymous |
eumetsat_meteosat |
Meteosat SEVIRI (MSG-CLM) | Europe/Africa/Indian-Ocean | key (EUMETSAT Data Store) |
pip install mostlyrightmd-weather[satellite]from mostlyright.weather.satellite import satellite
# Explicit source.
df = satellite("RJTT", "himawari9", product="AHI-L2-FLDK-Clouds", start=..., end=...)
# Omit satellite= to AUTO-ROUTE by the station's coverage region.
df = satellite("KSEA", start=..., end=...) # -> noaa_goes (GOES-West/PACUS)
df = satellite("RJTT", start=..., end=...) # -> jma_himawari
df = satellite("EGLL", start=..., end=...) # -> noaa_viirs fill (Meteosat gated from auto-route; see below)
# Meteosat SEVIRI projection landed but its live Data-Store GRIB reader is a
# forward seam, so it is reachable by EXPLICIT satellite= (and gated out of
# auto-routing). An explicit fetch surfaces a clear "not yet wired" signal:
df = satellite("EGLL", "meteosat-0deg", start=..., end=...) # needs a Data-Store keyLive vs hosted. delivery="live" (default) self-parses the public data
directly. delivery="hosted" is the Phase-27 paid-adapter seam and raises a
clear "arrives in Phase 27" error — there is no api.mostlyright.md call
anywhere. The future paid adapter shares each native source identity
(byte-identical to live self-extraction), distinguished only by the
informational delivery lineage column.
The fleet bulk/training path is python -m mostlyright.weather.satellite backfill (per-(satellite,year,month) slices, crash-safe resume,
--mirror aws|gcp, Thread/Process split). max_workers + the S3 rate cap are
probe-derived constants — run python -m mostlyright.weather.satellite probe to
re-measure. See docs/satellite.md for the source-identity
table, coverage map, auto-routing, cheap-CONUS steering, DSRF gating, the 28 TB
/ near-data reality, and the EUMETSAT key-setup.
- No hosted backend. Direct calls to public APIs (NOAA, NWS, IEM, Kalshi, Polymarket). No proxy. No vendor account. No rate-limited tier.
- Local-first cache. Parquet (Python) or JSON envelope (Node) at
~/.mostlyright/cache/. Byte-stable across runs — deterministic backtests. - Schema-versioned outputs. Every response carries a stable
schema.*.v1URI. Train/infer source mismatches fail loudly instead of silently corrupting models. - Python + TypeScript peers. Same
research()shape, byte-equivalent on the parity fixtures. Use whichever runtime your stack prefers. - MIT licensed. Use it commercially. Fork it. Ship it.
- Quickstart + concepts: https://mostlyright.md/docs/sdk/
- API reference: https://mostlyright.md/docs/sdk/ (Python + TypeScript reference auto-generated per release)
- Migration from the legacy hosted-API client: https://mostlyright.md/docs/sdk/migration/legacy/
Bug reports, feature requests, and pull requests are welcome. See CONTRIBUTING.md for the development workflow (fork → branch → PR) and the test gate. See CODE_OF_CONDUCT.md for community guidelines.
For security issues, see SECURITY.md — do not file public issues for vulnerabilities; email vu@mostlyright.md instead.
MIT. See LICENSE.
mostlyright calls the following public APIs directly. We are grateful for the work that makes weather and market data accessible at the public-API layer:
- NOAA Aviation Weather Center (AWC) — live METAR feeds
- Iowa State University Iowa Environmental Mesonet (IEM) — ASOS archive + CLI text products
- NOAA National Centers for Environmental Information (NCEI) — GHCNh historical observations
- National Weather Service (NWS) — climate data products (CLI) and forecast model outputs
- Open-Meteo — multi-provider forecast aggregation (added Phase 20). 36 models across NCEP / ECMWF / DWD / Météo-France / Asia / Oceania / Europe / GEM Canada. Leakage-safe via per-cycle endpoint + conservative
issued_atlower bound. See docs/forecast-sources.md. - Kalshi + Polymarket — public prediction-market metadata + settlement feeds