Skip to content

Latest commit

 

History

History
128 lines (103 loc) · 3.88 KB

File metadata and controls

128 lines (103 loc) · 3.88 KB
license cc-by-nc-sa-4.0
language
en
pretty_name CausalDriveBench
size_categories
n<10K
task_categories
question-answering
visual-question-answering
tags
autonomous-driving
causal-reasoning
counterfactual
nuscenes
scene-graph
benchmark
configs
config_name data_files
qa
split path
test
qa.jsonl

CausalDriveBench

A benchmark for causal reasoning in autonomous driving built on top of nuScenes. Each sample bundles a curated causal scene graph, three flavours of multiple-choice / open-ended QA (active, dormant, distractor), and pointers to the raw nuScenes frames so the benchmark stays compact and license-clean.

At a glance

  • Subset uploaded: {{SUBSET}}
  • Samples: {{N_SAMPLES}} across {{N_SCENES}} scenes
  • Tasks: active QA, dormant QA, distractor QA, causal scene graphs

Folder structure

causaldrivebench/
├── README.md
├── LICENSE
├── manifest.json                       (top-level: splits, n_samples, missing-* lists)
├── qa.jsonl                            (flat aggregate: one row per QA pair, all 7,285 QAs)
└── data/
    └── {{SUBSET}}/
        ├── nuscenes-scene-XXXX/
        │   └── SAMPLED_N/
        │       ├── state.json          (ego pose / agents at the anchor frame)
        │       ├── frames.json         (per-timestep paths to nuScenes images, RELATIVE)
        │       ├── calib.json          (camera calibration matrices)
        │       ├── meta.json           (location + nuScenes sample token)
        │       ├── graph.json          (causal scene graph)
        │       └── qa/
        │           ├── active_qa.json
        │           ├── dormant_qa.json
        │           └── distractor_qa.json
        └── ...

qa.jsonl is the streaming-friendly view (one row = one QA pair with scene_id, sample_id, qa_type, plus the QA fields). The per-scene tree under data/ is the authoritative source — use it when you need state / graph / image references for a sample.

Resolving image paths

Raw nuScenes images are not redistributed — get them from the official nuScenes download page. Once extracted, point NUSCENES_ROOT at the dir containing samples/ and sweeps/:

export NUSCENES_ROOT=/path/to/nuscenes

Then, for any sample's frames.json:

import os, json

frames = json.load(open("data/{{SUBSET}}/nuscenes-scene-0001/SAMPLED_0/frames.json"))
img = os.path.join(os.environ["NUSCENES_ROOT"], frames["frames"]["Tp0p0"]["cam_front"])

The data-root placeholder lives in one place only: manifest.json#nuscenes_root_placeholder = "{{ROOT_PLACEHOLDER}}". The per-sample frames.json files don't carry it.

Loading examples

Snapshot the whole repo:

from huggingface_hub import snapshot_download
root = snapshot_download("causaldrivebench/CausalDriveBench", repo_type="dataset")

Stream only QA via datasets (uses qa.jsonl):

from datasets import load_dataset
ds = load_dataset("causaldrivebench/CausalDriveBench", "qa", split="test")
print(ds[0])  # {scene_id, sample_id, qa_type, id, rung, category, ...}

QA statistics

{{QA_STATS}}

License + attribution

Released under CC BY-NC-SA 4.0. Built on top of nuScenes — by using this dataset you also accept the nuScenes Terms of Use. The scene graphs, QA pairs, and packaging code are original; raw camera frames remain under the original nuScenes license and are not included here.

Citation

@inproceedings{causaldrivebench2026,
  title  = {CausalDriveBench: Evaluating Causal Reasoning in Vision-Language-Action Models for Autonomous Driving},
  author = {Anonymous Authors},
  year   = {2026}
}