VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos

VCBench is a streaming counting benchmark for long videos. It treats counting as a minimal probe for diagnosing spatial-temporal state maintenance in video-language models. The benchmark queries a model at multiple time points during playback and measures how its predictions evolve over time, rather than only checking a single final answer.

Abstract

VCBench decomposes counting into eight subcategories across two axes: object counting and event counting. Object counting includes current-state snapshots, state deltas, identity-tracking counts, and windowed gains. Event counting includes atomic actions, state transitions, episodic segments, and periodic actions. The dataset contains 406 videos, 1,000 questions, 4,576 query points, and 10,071 annotated event or state-change moments. The evaluation protocol uses three complementary metrics: GPA for numerical precision, MoC for monotonic consistency, and UDA for update-direction accuracy.

What Is Included

data/vcbench_eval.jsonl
data/vcbench_data.jsonl
eval/demo_gemini.py
eval/unify_results.py
eval/compute_metrics.py
run_gemini_eval.sh
requirements.txt

What This Release Can Do

Run a Gemini evaluation demo on VCBench
Convert raw per-query-point results into unified per-question format
Compute GPA, MoC, and UDA

This release is designed so that someone who clones the repo can follow the README and run the provided scripts.

Download Data From Hugging Face

Download the benchmark videos from the Hugging Face dataset:

huggingface-cli download buaaplay/VCBench --repo-type dataset --local-dir data/videos

The demo script expects the source videos to be organized like this:

data/videos/
  RoomTour3D/
    -FZTi5EfPSQ.mp4
  scannetpp/
    09c1414f1b.mp4
  ...

Install

pip install -r requirements.txt

One-Command Demo

Set your Gemini key:

export GEMINI_API_KEY="your-gemini-api-key"

Then run the provided shell script:

bash run_gemini_eval.sh --video-dir data/videos --limit 5

You can also override the defaults with environment variables such as VIDEO_DIR, INPUT_JSONL, LIMIT, MODEL, and FPS.

The script will:

Run Gemini on a small demo slice of VCBench
Write raw per-query-point outputs to outputs/
Convert the raw file to unified format
Compute GPA, MoC, and UDA

Manual Steps

If you want to run the pieces separately:

1. Gemini demo

python eval/demo_gemini.py \
  --video-dir data/videos \
  --input data/vcbench_eval.jsonl \
  --limit 5

2. Convert to unified format

python eval/unify_results.py outputs/vcbench_gemini_demo_*.jsonl outputs/vcbench_gemini_demo_unified.jsonl

3. Compute metrics

python eval/compute_metrics.py outputs/vcbench_gemini_demo_unified.jsonl data/vcbench_eval.jsonl

The metric script prints raw 0-1 values. Multiply by 100 if you want paper-style percentages.

Metric Definitions

GPA: Gaussian Precision Accuracy. Higher is better.
MoC: Monotonicity Consistency. Higher is better.
UDA: Update Direction Accuracy. Higher is better.

Files

data/
  vcbench_eval.jsonl
  vcbench_data.jsonl
eval/
  demo_gemini.py
  unify_results.py
  compute_metrics.py
run_gemini_eval.sh
requirements.txt

Citation

@misc{liu2026vcbenchstreamingcountingbenchmark,
      title={VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos}, 
      author={Pengyiang Liu and Zhongyue Shi and Hongye Hao and Qi Fu and Xueting Bi and Siwei Zhang and Xiaoyang Hu and Zitian Wang and Linjiang Huang and Si Liu},
      year={2026},
      eprint={2603.12703},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.12703}, 
}

License

This dataset and code are released under CC BY 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
eval		eval
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_gemini_eval.sh		run_gemini_eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos

Abstract

What Is Included

What This Release Can Do

Download Data From Hugging Face

Install

One-Command Demo

Manual Steps

1. Gemini demo

2. Convert to unified format

3. Compute metrics

Metric Definitions

Files

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos

Abstract

What Is Included

What This Release Can Do

Download Data From Hugging Face

Install

One-Command Demo

Manual Steps

1. Gemini demo

2. Convert to unified format

3. Compute metrics

Metric Definitions

Files

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages