Skip to content

lijun2005/CVPR26-DreamPRVR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval

arXiv 52CV

⭐ If DreamPRVR is helpful to your projects, please help star this repo. Thanks! 🤗

We sincerely invite readers to refer to our previous work ICCV25-HLFormer, as well as our curated Awesome-PRVR.

TABLE OF CONTENTS

1. Introduction

This repository contain the implementation of our work at CVPR 2026 main:

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval Jun Li, Xuhang Lou, Jinpeng Wang, Yuting Wang, Yaowei Wang, Shu-Tao Xia, Bin Chen.

overview we propose DreamPRVR, which adopts a coarse-to-fine learning paradigm. (i) The model first generates global contextual semantic registers as coarse-grained highlights spanning the entire video and then concentrates on fine-grained similarity optimization for precise cross-modal matching. Concretely, these registers are generated by initializing from the video-centric distribution produced by a probabilistic variational sampler and then iteratively refined via a text-supervised truncated diffusion model. (ii) During this process, textual semantic structure learning constructs a well-formed textual latent space, enhancing the reliability of global perception. (iii) The registers are then fused with video tokens through register-augmented Gaussian attention blocks, enabling context-aware learning.

2. Preparation

git clone https://github.com/lijun2005/CVPR26-DreamPRVR.git
cd CVPR26-DreamPRVR/

2.1 Requirements

We train Charades-STA on Nvidia 3080 Ti with the environment:

  • python==3.11.8
  • pytorch==2.0.1

We train TVR, ActivityNet Captions on Nvidia A100-40G with the environment:

  • python==3.9.17
  • pytorch==2.0.1

2.2 Download the datasets

All features can be downloaded from Baidu pan or Google drive (thanks to ms-sl).

!!! Please note that we did not use any features derived from ViT.

The dataset directory is organized as follows:

DreamPRVR/
    ├── activitynet/
    │   ├── FeatureData/
    │   ├── TextData/
    │   ├── val_1.json
    │   └── val_2.json
    ├── charades/
    │   ├── FeatureData/
    │   └── TextData/
    └── tvr/
        ├── FeatureData/
        └── TextData/

We convert the feature.bin into feature.hdf5 . Please refer to src/Utils/convert_hdf5.py (thanks to FAWL).

Finally, set root and data_root in config files (e.g., ./src/Configs/tvr.py cfg['root'] and cfg['data_root']).

3. Run

3.1 Train

To train DreamPRVR on ActivityNet Captions:

cd src
python main.py -d act --gpu 0

To train DreamPRVR on Charades-STA:

cd src
python main.py -d cha --gpu 0

To train DreamPRVR on TVR:

cd src
python main.py -d tvr --gpu 0

3.2 Retrieval Performance

For this repository, the expected performance is:

Dataset R@1 R@5 R@10 R@100 SumR Log Ckpt
ActivityNet Captions 8.7 27.5 40.3 79.5 156.1 act-log act-ckpt
Charades-STA 2.6 8.7 14.5 54.2 80.0 cha-log cha-ckpt
TVR 17.4 39.0 50.4 86.2 193.1 tvr-log tvr-ckpt

4. References

If you find our code useful or use the toolkit in your work, please consider citing:

@misc{li2026dreamprvr,
      title={Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval}, 
      author={Jun Li and Xuhang Lou and Jinpeng Wang and Yuting Wang and Yaowei Wang and Shu-Tao Xia and Bin Chen},
      year={2026},
      eprint={2604.03653},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.03653}, 
}

5. Acknowledgements

This code is based on HLFormer and GMMFormerV2. We are also grateful for other teams for open-sourcing codes that inspire our work, including MS-SL, DiffIR.

6. Contact

If you have any question, you can raise an issue or email Jun Li (220110924@stu.hit.edu.cn) and Jinpeng Wang (wangjp26@gmail.com).

About

[CVPR 2026 main] Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval.

Topics

Resources

Stars

Watchers

Forks

Languages