Producer + spec for the COSC dataset — Contrail Observations from Sky Cameras.
A research dataset of contrail observations from ground-based sky cameras, paired with per-trajectory-point empirical lifetimes. Built to let researchers benchmark contrail-evolution models (CoCiP, RFM, custom) against detected-from-the-ground reality.
The headline of the v0.1-preview release: two stations, two 13 h videos, ~47k emission points combined, showing how long each contrail remained detectable. The two stations have visibly different censoring regimes — NL000Q (lots of high-altitude traffic slowly drifting through FOV) is censoring-dominated; CAWEC1 has much more in-view dissipation. Censoring is per-row metadata in the dataset so any survival-analysis-aware comparison handles it correctly.
⚠️ The 30 min cap is the Janus pipeline'smax_contrail_agedefault (0.5 h), not the natural maximum. Natural contrail lifetimes commonly run 1-4 h; the v0.1-preview dataset doesn't see that tail. v0.2 will regenerate with a longer horizon (3 h is already supported by the trace generator) once the GT review has been refreshed at the new horizon.
Each row in detection_events.parquet answers the question: which
moment along which flight's track emitted the contrail this mask
saw? Wind has carried the contrails away from the aircraft's
position by the time the camera detects them — so we need to project
every ADS-B trajectory sample forward through ERA5 wind to where its
contrail has drifted by the observation time. Mask polygons sit on
that advected trace, NOT on the aircraft's actual ground track.
Inside one frame at CAWEC1 (Ontario): GT-verified mask polygons
(red, n=30) on AAL168's contrail, the advected contrail trace at
that moment — cyan polyline + rainbow dots show where each emitted
trajectory point's contrail has been drifted to by ERA5 wind by
this observation time (so the rainbow dots are NOT where the
aircraft was at emission; they're where its emissions are NOW).
Yellow rings mark the 15 trajectory points whose advected positions
fall on the mask polygons — i.e. the discrete emission moments
along the flight whose contrails the detector is seeing in these
red blobs. One row in detection_events.parquet corresponds to
each (yellow ring, red polygon) pair at this frame; the dataset is
built from millions of these.
This dataset rides on the imaging infrastructure of the
Global Meteor Network (GMN),
the worldwide sky-camera network founded and led by
Dr. Denis Vida
at Western University (London, Ontario, Canada). The GMN's
all-sky cameras and the RMS
software stack (camera capture, astrometric calibration, station
metadata) are what make COSC possible — every station in this
dataset is a GMN camera and every coordinate transform uses RMS
calibration. Please cite GMN and RMS in publications that use COSC;
see specs/DATASET_CARD.md
for references.
- Code (this repository): PolyForm Noncommercial 1.0.0. Free for research, academic, personal, and other non-commercial use. Commercial use requires a separate written agreement.
- Dataset (the published parquet release): CC-BY-NC-4.0. Same scheme, with attribution.
- Commercial use of either: contact the maintainer.
See LICENSE for the code license verbatim and
specs/DATASET_CARD.md for the dataset terms.
cosc-tools/
├── specs/ Public dataset documentation
│ ├── SCHEMA.md Normative column-level schema for v0.1
│ └── DATASET_CARD.md License, scope, cleaning spec, citation
├── cosc/ Producer package
│ ├── schema.py Arrow schema definitions for every table
│ ├── attribute.py Projection-only trajectory-point attribution
│ ├── build.py Per-video producer
│ └── cli.py `python -m cosc.cli build <staging_dir>`
├── examples/ Reader examples
│ └── load_v01.ipynb Minimal load + per-emission-lifetime plot
└── README.md (this file)
git clone https://github.com/Cybis320/cosc-tools.git
cd cosc-tools
pip install -e .Download the v0.1-preview release (NL000Q, GT-verified, ~8 MB):
wget https://github.com/Cybis320/cosc-tools/releases/download/v0.1-preview/cosc-v0.1-preview-NL000Q.tar.gz
tar xzf cosc-v0.1-preview-NL000Q.tar.gzimport pandas as pd
base = "cosc-v0.1/country=NL/station=NL000Q/date=2025-10-01"
# Atomic per-frame attributions (the time-resolved evidence)
events = pd.read_parquet(f"{base}/detection_events.parquet")
# Aggregated per-trajectory-point observations (headline table for model comparison)
obs = pd.read_parquet(f"{base}/trajectory_point_observations.parquet")See examples/load_v01.ipynb for a runnable
quickstart (loads the preview, summarises GT/censoring stats, plots
per-emission-point observed contrail lifetime for one flight). Run with:
pip install -e '.[examples]'
jupyter notebook examples/load_v01.ipynbSee specs/SCHEMA.md for the full column-by-column contract.
python -m cosc.cli build /path/to/staging_dir --out_root /path/to/cosc-v0.1v0.1 — DRAFT. Single station (NL000C). Schema is locked. Cleaning spec
section in DATASET_CARD.md is still being collated and will be finalized
before the v0.1.0 tagged release.
For each row in trajectory_point_observations:
- Run your contrail-evolution model on the emission point
(flight_id, emit_ts_utc, emit_lat, emit_lon, emit_alt_geom_m). - Compare your model's predicted dissipation age to
detect_last_age_s. - Handle censoring: rows with
censored_fov_exit = Trueorcensored_video_end = Trueare right-censored (lifetime ≥ reported, actual may be longer). Standard survival-analysis methods (Kaplan–Meier, Cox PH) handle this.
For higher-resolution comparison, use detection_events (one row per
(flight, mask, frame) attribution) to compare time-resolved predictions
to the empirical detection density curve.
For commercial use of either the code or the dataset, dataset issues, or schema discussions: Luc Busquin <luc.busquin@contrailcast.com>.


