🦙 Llama-3 DPO Alignment Pipeline

A professional, modular pipeline to fine-tune Llama-3 8B using Direct Preference Optimization (DPO) — from training to inference to benchmarking.

🚀 What is This Project?

This repository contains a complete, production-grade pipeline for aligning a large language model using DPO. Instead of manually patching notebooks together, this project is structured as reusable Python scripts with YAML configurations so anyone can reproduce the training with a single command.

⚡ The Core Idea

Concept	What We Did
Base Model	`unsloth/llama-3-8b-Instruct-bnb-4bit`
Alignment Technique	Direct Preference Optimization (DPO)
Dataset	`Intel/orca_dpo_pairs` (1,000 samples)
Speed Optimization	Unsloth (2x faster training)
Memory Optimization	4-bit quantization + Gradient Checkpointing
Environment	Kaggle T4 x2 GPU (Free Tier)
Trained Adapter	🤗 Karan6124/llama3-8b-dpo-orca-adapter

🗂️ Project Structure

llama3-dpo-alignment-pipeline/
├── 📁 configs/
│   ├── dpo_config.yaml          # All DPO training hyperparameters
│   └── benchmark_config.yaml   # Test prompts & generation settings
├── 📁 scripts/
│   └── train_dpo.py            # The main training engine (reads from configs/)
├── 📁 inference/
│   └── inference.py            # Load adapter and run interactive inference
├── 📁 evaluation/
│   └── benchmark.py            # Compare Base vs. Aligned model side-by-side
├── 📁 training/
│   └── training-llama3-dpo.ipynb  # The original Kaggle notebook
├── 📁 models/                  # (gitignored) Local adapter weights live here
├── pyproject.toml              # Dependency management with uv
└── README.md

🛠️ Setup & Installation

This project uses uv — the fastest Python package manager. Everything is managed in one command.

1. Install `uv`

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Clone the Repo & Sync Dependencies

git clone https://github.com/Edge-Explorer/llama3-dpo-alignment-pipeline.git
cd llama3-dpo-alignment-pipeline
uv sync

Note: Unsloth requires an NVIDIA GPU to import. For local development, you can write and review code without a GPU. Use Kaggle or Google Colab to actually run the scripts.

🏋️ Training

All training parameters are in configs/dpo_config.yaml. You can tweak learning rates, batch sizes, and sequence lengths without touching any Python code.

# On Kaggle or a GPU machine:
uv run python scripts/train_dpo.py

Key Hyperparameters (optimized for T4 GPU):

beta: 0.1 (DPO temperature)
learning_rate: 5e-6
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
max_length: 768

📊 Monitoring & Metrics

We tracked the entire DPO alignment process using Weights & Biases (WandB).

📈 Training Progress

The training loss shows a smooth convergence, confirming that the DPO adapter is effectively learning the preference pairs.

🖥️ System Performance

We optimized the pipeline for Kaggle T4 x2 GPUs. Memory usage remained stable under 15GB thanks to 4-bit quantization and gradient checkpointing.

🤖 Inference

Download the trained adapter from Hugging Face and load it locally.

# Make sure the adapter is in models/llama3_dpo_adapter/
uv run python inference/inference.py

Or use it directly in your own code:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Karan6124/llama3-8b-dpo-orca-adapter",
    max_seq_length = 2048,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "Why use DPO over SFT?"}]
inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

📊 Benchmarking

Run the benchmark script on Kaggle to generate a comparison report:

uv run python evaluation/benchmark.py

Results are saved automatically to evaluation/benchmark_report.txt.

🏆 Key Lessons Learned

Problem	Solution
`transformers 5.0.0` broke `trl` in Colab	Switched to Kaggle's stable environment
`DPOConfig` not found in old `trl`	Pinned to `trl>=0.12.0`
`OutOfMemoryError` on T4 GPU	Reduced batch size to 1, enabled gradient checkpointing
Slow training	Unsloth's `PatchDPOTrainer` gave ~2x speedup
Messy notebook workflow	Refactored into reusable scripts + YAML configs

📄 License

This project is licensed under the MIT License — see LICENSE for details.

🙏 Acknowledgements

Unsloth for making LLM fine-tuning incredibly fast
TRL by HuggingFace for the DPO implementation
Intel/orca_dpo_pairs for the training dataset
Kaggle for the free GPUs that made this possible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙 Llama-3 DPO Alignment Pipeline

🚀 What is This Project?

⚡ The Core Idea

🗂️ Project Structure

🛠️ Setup & Installation

1. Install `uv`

2. Clone the Repo & Sync Dependencies

🏋️ Training

📊 Monitoring & Metrics

📈 Training Progress

🖥️ System Performance

🤖 Inference

📊 Benchmarking

🏆 Key Lessons Learned

📄 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
configs		configs
evaluation		evaluation
inference		inference
notebooks		notebooks
scripts		scripts
training		training
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
dpo system image.png		dpo system image.png
dpo train image.png		dpo train image.png
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🦙 Llama-3 DPO Alignment Pipeline

🚀 What is This Project?

⚡ The Core Idea

🗂️ Project Structure

🛠️ Setup & Installation

1. Install uv

2. Clone the Repo & Sync Dependencies

🏋️ Training

📊 Monitoring & Metrics

📈 Training Progress

🖥️ System Performance

🤖 Inference

📊 Benchmarking

🏆 Key Lessons Learned

📄 License

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Install `uv`

Packages