Reproducible benchmark of Kronos-base on direction (+ validating the demo's own forecasts) — and an offer to test Kronos-large

Hi, and congrats on the AAAI 2026 acceptance 🎉 — thanks for releasing
mini/small/base under MIT, it made this possible.

I ran a small, fully reproducible benchmark of **Kronos-base** (zero-shot) on the
hardest task — direction — across BTCUSDT/gold and several horizons. As a fair
check I also validated the **official demo's own published forecasts** (pulled
4,682 hourly 24h "upside probabilities" from the Kronos-demo git history) against
realized Binance prices. Write-up + reproducible code:

  https://gist.github.com/moscowmule2240/1bb5bf350ad58e42199ac350f8768672

Short version: at base scale direction is around chance so far, and the
probabilities are a bit overconfident — but the gap looks like calibration +
signal strength rather than the underlying approach, which is exactly what tends
to improve with capacity / fine-tuning.

That's why I'm genuinely curious about **Kronos-large**. If the team is open to it
(or there's a process), I'd love to run this exact benchmark on large and share
everything I find. Happy to follow any terms and take details over email. Rooting
for it 🙂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducible benchmark of Kronos-base on direction (+ validating the demo's own forecasts) — and an offer to test Kronos-large #323

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducible benchmark of Kronos-base on direction (+ validating the demo's own forecasts) — and an offer to test Kronos-large #323

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions