Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval

⭐ If DreamPRVR is helpful to your projects, please help star this repo. Thanks! 🤗

We sincerely invite readers to refer to our previous work ICCV25-HLFormer, as well as our curated Awesome-PRVR.

1. Introduction

This repository contain the implementation of our work at CVPR 2026 main:

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval Jun Li, Xuhang Lou, Jinpeng Wang, Yuting Wang, Yaowei Wang, Shu-Tao Xia, Bin Chen.

we propose DreamPRVR, which adopts a coarse-to-fine learning paradigm. (i) The model first generates global contextual semantic registers as coarse-grained highlights spanning the entire video and then concentrates on fine-grained similarity optimization for precise cross-modal matching. Concretely, these registers are generated by initializing from the video-centric distribution produced by a probabilistic variational sampler and then iteratively refined via a text-supervised truncated diffusion model. (ii) During this process, textual semantic structure learning constructs a well-formed textual latent space, enhancing the reliability of global perception. (iii) The registers are then fused with video tokens through register-augmented Gaussian attention blocks, enabling context-aware learning.

2. Preparation

git clone https://github.com/lijun2005/CVPR26-DreamPRVR.git
cd CVPR26-DreamPRVR/

2.1 Requirements

We train Charades-STA on Nvidia 3080 Ti with the environment:

python==3.11.8
pytorch==2.0.1

We train TVR, ActivityNet Captions on Nvidia A100-40G with the environment:

python==3.9.17
pytorch==2.0.1

2.2 Download the datasets

All features can be downloaded from Baidu pan or Google drive (thanks to ms-sl).

!!! Please note that we did not use any features derived from ViT.

The dataset directory is organized as follows:

DreamPRVR/
    ├── activitynet/
    │   ├── FeatureData/
    │   ├── TextData/
    │   ├── val_1.json
    │   └── val_2.json
    ├── charades/
    │   ├── FeatureData/
    │   └── TextData/
    └── tvr/
        ├── FeatureData/
        └── TextData/

We convert the feature.bin into feature.hdf5 . Please refer to src/Utils/convert_hdf5.py (thanks to FAWL).

Finally, set root and data_root in config files (e.g., ./src/Configs/tvr.py cfg['root'] and cfg['data_root']).

3. Run

3.1 Train

To train DreamPRVR on ActivityNet Captions:

cd src
python main.py -d act --gpu 0

To train DreamPRVR on Charades-STA:

cd src
python main.py -d cha --gpu 0

To train DreamPRVR on TVR:

cd src
python main.py -d tvr --gpu 0

3.2 Retrieval Performance

For this repository, the expected performance is:

Dataset	R@1	R@5	R@10	R@100	SumR	Log	Ckpt
ActivityNet Captions	8.7	27.5	40.3	79.5	156.1	act-log	act-ckpt
Charades-STA	2.6	8.7	14.5	54.2	80.0	cha-log	cha-ckpt
TVR	17.4	39.0	50.4	86.2	193.1	tvr-log	tvr-ckpt

4. References

If you find our code useful or use the toolkit in your work, please consider citing:

@misc{li2026dreamprvr,
      title={Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval}, 
      author={Jun Li and Xuhang Lou and Jinpeng Wang and Yuting Wang and Yaowei Wang and Shu-Tao Xia and Bin Chen},
      year={2026},
      eprint={2604.03653},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.03653}, 
}

5. Acknowledgements

This code is based on HLFormer and GMMFormerV2. We are also grateful for other teams for open-sourcing codes that inspire our work, including MS-SL, DiffIR.

6. Contact

If you have any question, you can raise an issue or email Jun Li (220110924@stu.hit.edu.cn) and Jinpeng Wang (wangjp26@gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
figures		figures
logs		logs
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval

TABLE OF CONTENTS

1. Introduction

2. Preparation

2.1 Requirements

2.2 Download the datasets

3. Run

3.1 Train

3.2 Retrieval Performance

4. References

5. Acknowledgements

6. Contact

About

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval

TABLE OF CONTENTS

1. Introduction

2. Preparation

2.1 Requirements

2.2 Download the datasets

3. Run

3.1 Train

3.2 Retrieval Performance

4. References

5. Acknowledgements

6. Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 1

Languages