Rust workspace for polling remote datasets, transforming them, and storing the results on a schedule.
core/— ingestion, storage, transformation, and shared pipeline traits/typesorchestration/— orchestrator and YAML config modelcli/— command-line entrypoint for running an orchestrator from a YAML fileexamples/— example configs, includingexamples/pmxt.yaml
From the repository root:
cargo run -p data-poller-cli -- /home/sarem/projects/trading/data-poller-rs/examples/pmxt.yamlIf you are already in the repo root, the relative path also works:
cargo run -p data-poller-cli -- examples/pmxt.yamlThe CLI expects exactly one argument: the path to a YAML config file.
examples/pmxt.yaml is a ready-to-run example that:
- fetches the PMXT archive index at
https://archive.pmxt.dev/Polymarket/v2 - selects parquet listing rows with
pre > span - extracts each parquet URL from the nested
atag - runs
SELECT * FROM remote_parquet LIMIT 5 - writes output to the configured local directory
Example:
cargo run -p data-poller-cli -- examples/pmxt.yamlRun the workspace test suite from the repo root:
cargo test --workspace