REMOR : Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning

Notes

distilbert_models can be downloaded from Link
for grpo_training and review_generation, please download and extract the contents of distilbert_models in their directories.

Loading reward function and review generation

Environment Setup

conda create -n grpo python=3.11
conda activate grpo
pip install openreview-py
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install --user -U nltk
pip install ollama pandas matplotlib numpy seaborn tqdm
pip install git+https://github.com/huggingface/transformers.git@main

Calculating human-aligned weights

Environment Setup

conda create -n human-value-calculation python=3.12  

conda activate human-value-calculation 
  
pip install cvxpy ecos scikit-learn numpy pandas pyarrow

Calculation can be done by the notebook file optim_cv.ipynb located in /human_value_calculation

Training

SFT via LLaMA-Factory

Environment Setup

conda create -n llama-factory python=3.11  
conda activate llama-factory  

git clone https://github.com/hiyouga/LLaMA-Factory.git  
cd LLaMA-Factory  
pip install -e ".[torch,metrics]"  
pip install deepspeed==0.16.2  
pip install liger-kernel

GRPO via TRL

Environment Setup

conda create -n grpo python=3.11
conda activate grpo
pip install openreview-py
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

pip install --user -U nltk

pip install deepspeed

pip install trl[peft]
pip install bitsandbytes loralib
pip install git+https://github.com/huggingface/transformers.git@main

pip install git+https://github.com/huggingface/accelerate.git@main

pip install git+https://github.com/huggingface/trl.git

Launching TRL training scripts

conda activate grpo
accelerate launch trl_iclr_90th_u.py

Accessing Data from Huggingface.co

Below is a simple snippet to access the dataset via huggingface datasets.

from datasets import load_dataset

ds = load_dataset("pawin205/iclr-2017-2020-peer-review-with-thinking-trace")

Accessing Models from Huggingface.co

Below is a simple snippet to load the model via huggingface.

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pawin205/Qwen-7B-Review-ICLR-GRPO-U")
model = AutoModelForCausalLM.from_pretrained("pawin205/Qwen-7B-Review-ICLR-GRPO-U")

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
grpo_training		grpo_training
human_value_calculation		human_value_calculation
ollama_setup		ollama_setup
review_generation		review_generation
sft_training		sft_training
trace_generator		trace_generator
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REMOR : Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning

Notes

Loading reward function and review generation

Environment Setup

Calculating human-aligned weights

Environment Setup

Training

SFT via LLaMA-Factory

Environment Setup

GRPO via TRL

Environment Setup

Launching TRL training scripts

Accessing Data from Huggingface.co

Accessing Models from Huggingface.co

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

REMOR : Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning

Notes

Loading reward function and review generation

Environment Setup

Calculating human-aligned weights

Environment Setup

Training

SFT via LLaMA-Factory

Environment Setup

GRPO via TRL

Environment Setup

Launching TRL training scripts

Accessing Data from Huggingface.co

Accessing Models from Huggingface.co

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages