Skip to content

HaozheH3/Hierarchical-Reasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning

Haozhe Wang$^{β™ ,β™₯,♦}$, Qixin Xu$^{β™₯,♦}$, Che Liu$^{♀}$, Junhong Wu$^{β™‘}$, $^\dagger$Fangzhen Lin$^{β™ }$, $^\dagger$Wenhu Chen$^{β™₯}$

The Hong Kong University of Science and Technology$^{β™ }$, University of Waterloo$^{β™₯}$, M-A-P$^{♦}$, Tsinghua University$^{♣}$, Imperial College London$^{♀}$, UCAS$^{β™‘}$

Paper Hugging Face Collection

πŸ“– TL;DR

Reinforcement Learning (RL) has been a game-changer for teaching LLMs complex reasoning, but how it works has been a mystery. Puzzling behaviors like sudden "aha moments," and performance boosts from longer answers ("length-scaling") have been observed, but not understood.

In this work, we reveal that these are not random quirks. They are the hallmarks of an emergent reasoning hierarchy, where the model learns to reason much like a human: by separating high-level strategic planning from low-level procedural execution. We show this process unfolds in two overlapping phases and leverage this insight to create a more efficient RL algorithm.

πŸš€ Release Plan

We will release the training recipe on top of VeRL and all models.

Stay tuned for updates!

🍊 Citation

If you find our work useful for your research, please consider citing our paper:

@article{wang2025emergent,
  title={Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning},
  author={Wang, Haozhe and Xu, Qixin and Liu, Che and Wu, Junhong and Lin, Fangzhen and Chen, Wenhu},
  journal={arXiv preprint:2509.03646},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors