Skip to content

buaaplay/VCBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos

HuggingFace arXiv

VCBench is a streaming counting benchmark for long videos. It treats counting as a minimal probe for diagnosing spatial-temporal state maintenance in video-language models. The benchmark queries a model at multiple time points during playback and measures how its predictions evolve over time, rather than only checking a single final answer.

Abstract

VCBench decomposes counting into eight subcategories across two axes: object counting and event counting. Object counting includes current-state snapshots, state deltas, identity-tracking counts, and windowed gains. Event counting includes atomic actions, state transitions, episodic segments, and periodic actions. The dataset contains 406 videos, 1,000 questions, 4,576 query points, and 10,071 annotated event or state-change moments. The evaluation protocol uses three complementary metrics: GPA for numerical precision, MoC for monotonic consistency, and UDA for update-direction accuracy.

What Is Included

  • data/vcbench_eval.jsonl
  • data/vcbench_data.jsonl
  • eval/demo_gemini.py
  • eval/unify_results.py
  • eval/compute_metrics.py
  • run_gemini_eval.sh
  • requirements.txt

What This Release Can Do

  • Run a Gemini evaluation demo on VCBench
  • Convert raw per-query-point results into unified per-question format
  • Compute GPA, MoC, and UDA

This release is designed so that someone who clones the repo can follow the README and run the provided scripts.

Download Data From Hugging Face

Download the benchmark videos from the Hugging Face dataset:

huggingface-cli download buaaplay/VCBench --repo-type dataset --local-dir data/videos

The demo script expects the source videos to be organized like this:

data/videos/
  RoomTour3D/
    -FZTi5EfPSQ.mp4
  scannetpp/
    09c1414f1b.mp4
  ...

Install

pip install -r requirements.txt

One-Command Demo

Set your Gemini key:

export GEMINI_API_KEY="your-gemini-api-key"

Then run the provided shell script:

bash run_gemini_eval.sh --video-dir data/videos --limit 5

You can also override the defaults with environment variables such as VIDEO_DIR, INPUT_JSONL, LIMIT, MODEL, and FPS.

The script will:

  1. Run Gemini on a small demo slice of VCBench
  2. Write raw per-query-point outputs to outputs/
  3. Convert the raw file to unified format
  4. Compute GPA, MoC, and UDA

Manual Steps

If you want to run the pieces separately:

1. Gemini demo

python eval/demo_gemini.py \
  --video-dir data/videos \
  --input data/vcbench_eval.jsonl \
  --limit 5

2. Convert to unified format

python eval/unify_results.py outputs/vcbench_gemini_demo_*.jsonl outputs/vcbench_gemini_demo_unified.jsonl

3. Compute metrics

python eval/compute_metrics.py outputs/vcbench_gemini_demo_unified.jsonl data/vcbench_eval.jsonl

The metric script prints raw 0-1 values. Multiply by 100 if you want paper-style percentages.

Metric Definitions

  • GPA: Gaussian Precision Accuracy. Higher is better.
  • MoC: Monotonicity Consistency. Higher is better.
  • UDA: Update Direction Accuracy. Higher is better.

Files

data/
  vcbench_eval.jsonl
  vcbench_data.jsonl
eval/
  demo_gemini.py
  unify_results.py
  compute_metrics.py
run_gemini_eval.sh
requirements.txt

Citation

@misc{liu2026vcbenchstreamingcountingbenchmark,
      title={VCBench: A Streaming Counting Benchmark for Spatial-Temporal State Maintenance in Long Videos}, 
      author={Pengyiang Liu and Zhongyue Shi and Hongye Hao and Qi Fu and Xueting Bi and Siwei Zhang and Xiaoyang Hu and Zitian Wang and Linjiang Huang and Si Liu},
      year={2026},
      eprint={2603.12703},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.12703}, 
}

License

This dataset and code are released under CC BY 4.0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors