Skip to content

Cybis320/cosc-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cosc-tools

Producer + spec for the COSC dataset — Contrail Observations from Sky Cameras.

A research dataset of contrail observations from ground-based sky cameras, paired with per-trajectory-point empirical lifetimes. Built to let researchers benchmark contrail-evolution models (CoCiP, RFM, custom) against detected-from-the-ground reality.

Contrail lifetime distributions — NL000Q + CAWEC1

The headline of the v0.1-preview release: two stations, two 13 h videos, ~47k emission points combined, showing how long each contrail remained detectable. The two stations have visibly different censoring regimes — NL000Q (lots of high-altitude traffic slowly drifting through FOV) is censoring-dominated; CAWEC1 has much more in-view dissipation. Censoring is per-row metadata in the dataset so any survival-analysis-aware comparison handles it correctly.

⚠️ The 30 min cap is the Janus pipeline's max_contrail_age default (0.5 h), not the natural maximum. Natural contrail lifetimes commonly run 1-4 h; the v0.1-preview dataset doesn't see that tail. v0.2 will regenerate with a longer horizon (3 h is already supported by the trace generator) once the GT review has been refreshed at the new horizon.

How the attribution works

How COSC attributes a YOLO mask back to discrete emission moments along a flight

Each row in detection_events.parquet answers the question: which moment along which flight's track emitted the contrail this mask saw? Wind has carried the contrails away from the aircraft's position by the time the camera detects them — so we need to project every ADS-B trajectory sample forward through ERA5 wind to where its contrail has drifted by the observation time. Mask polygons sit on that advected trace, NOT on the aircraft's actual ground track.

AAL168 GT-verified attribution overlay — CAWEC1 frame 5968

Inside one frame at CAWEC1 (Ontario): GT-verified mask polygons (red, n=30) on AAL168's contrail, the advected contrail trace at that moment — cyan polyline + rainbow dots show where each emitted trajectory point's contrail has been drifted to by ERA5 wind by this observation time (so the rainbow dots are NOT where the aircraft was at emission; they're where its emissions are NOW). Yellow rings mark the 15 trajectory points whose advected positions fall on the mask polygons — i.e. the discrete emission moments along the flight whose contrails the detector is seeing in these red blobs. One row in detection_events.parquet corresponds to each (yellow ring, red polygon) pair at this frame; the dataset is built from millions of these.

Built on the Global Meteor Network

This dataset rides on the imaging infrastructure of the Global Meteor Network (GMN), the worldwide sky-camera network founded and led by Dr. Denis Vida at Western University (London, Ontario, Canada). The GMN's all-sky cameras and the RMS software stack (camera capture, astrometric calibration, station metadata) are what make COSC possible — every station in this dataset is a GMN camera and every coordinate transform uses RMS calibration. Please cite GMN and RMS in publications that use COSC; see specs/DATASET_CARD.md for references.

Licenses

  • Code (this repository): PolyForm Noncommercial 1.0.0. Free for research, academic, personal, and other non-commercial use. Commercial use requires a separate written agreement.
  • Dataset (the published parquet release): CC-BY-NC-4.0. Same scheme, with attribution.
  • Commercial use of either: contact the maintainer.

See LICENSE for the code license verbatim and specs/DATASET_CARD.md for the dataset terms.

Project layout

cosc-tools/
├── specs/                      Public dataset documentation
│   ├── SCHEMA.md               Normative column-level schema for v0.1
│   └── DATASET_CARD.md         License, scope, cleaning spec, citation
├── cosc/                       Producer package
│   ├── schema.py               Arrow schema definitions for every table
│   ├── attribute.py            Projection-only trajectory-point attribution
│   ├── build.py                Per-video producer
│   └── cli.py                  `python -m cosc.cli build <staging_dir>`
├── examples/                   Reader examples
│   └── load_v01.ipynb          Minimal load + per-emission-lifetime plot
└── README.md                   (this file)

Quickstart

Install

git clone https://github.com/Cybis320/cosc-tools.git
cd cosc-tools
pip install -e .

Read the dataset

Download the v0.1-preview release (NL000Q, GT-verified, ~8 MB):

wget https://github.com/Cybis320/cosc-tools/releases/download/v0.1-preview/cosc-v0.1-preview-NL000Q.tar.gz
tar xzf cosc-v0.1-preview-NL000Q.tar.gz
import pandas as pd
base = "cosc-v0.1/country=NL/station=NL000Q/date=2025-10-01"

# Atomic per-frame attributions (the time-resolved evidence)
events = pd.read_parquet(f"{base}/detection_events.parquet")

# Aggregated per-trajectory-point observations (headline table for model comparison)
obs = pd.read_parquet(f"{base}/trajectory_point_observations.parquet")

See examples/load_v01.ipynb for a runnable quickstart (loads the preview, summarises GT/censoring stats, plots per-emission-point observed contrail lifetime for one flight). Run with:

pip install -e '.[examples]'
jupyter notebook examples/load_v01.ipynb

See specs/SCHEMA.md for the full column-by-column contract.

Build the dataset from a Janus pipeline output

python -m cosc.cli build /path/to/staging_dir --out_root /path/to/cosc-v0.1

Status

v0.1 — DRAFT. Single station (NL000C). Schema is locked. Cleaning spec section in DATASET_CARD.md is still being collated and will be finalized before the v0.1.0 tagged release.

How to compare a model to COSC

For each row in trajectory_point_observations:

  1. Run your contrail-evolution model on the emission point (flight_id, emit_ts_utc, emit_lat, emit_lon, emit_alt_geom_m).
  2. Compare your model's predicted dissipation age to detect_last_age_s.
  3. Handle censoring: rows with censored_fov_exit = True or censored_video_end = True are right-censored (lifetime ≥ reported, actual may be longer). Standard survival-analysis methods (Kaplan–Meier, Cox PH) handle this.

For higher-resolution comparison, use detection_events (one row per (flight, mask, frame) attribution) to compare time-resolved predictions to the empirical detection density curve.

Contact

For commercial use of either the code or the dataset, dataset issues, or schema discussions: Luc Busquin <luc.busquin@contrailcast.com>.

About

COSC — Contrail Observations from Sky Cameras (producer + spec). Code: PolyForm-NC. Data: CC-BY-NC-4.0.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages