Skip to content

Reproducible benchmark of Kronos-base on direction (+ validating the demo's own forecasts) β€” and an offer to test Kronos-largeΒ #323

@moscowmule2240

Description

@moscowmule2240

Hi, and congrats on the AAAI 2026 acceptance πŸŽ‰ β€” thanks for releasing
mini/small/base under MIT, it made this possible.

I ran a small, fully reproducible benchmark of Kronos-base (zero-shot) on the
hardest task β€” direction β€” across BTCUSDT/gold and several horizons. As a fair
check I also validated the official demo's own published forecasts (pulled
4,682 hourly 24h "upside probabilities" from the Kronos-demo git history) against
realized Binance prices. Write-up + reproducible code:

https://gist.github.com/moscowmule2240/1bb5bf350ad58e42199ac350f8768672

Short version: at base scale direction is around chance so far, and the
probabilities are a bit overconfident β€” but the gap looks like calibration +
signal strength rather than the underlying approach, which is exactly what tends
to improve with capacity / fine-tuning.

That's why I'm genuinely curious about Kronos-large. If the team is open to it
(or there's a process), I'd love to run this exact benchmark on large and share
everything I find. Happy to follow any terms and take details over email. Rooting
for it πŸ™‚

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions