Train a minimal, readable LLaMA-style transformer end‑to‑end in PyTorch. This repo focuses on clarity over cleverness: straightforward modules, explicit configs, and a tiny training loop you can understand in one sitting.
- Clean architecture:
LLamaModelwithTransformerBlock,GroupedQueryAttention,RMSNorm, and gated MLP. - Tokenization: SentencePiece-based tokenizer compatible with LLaMA vocab files.
- Tiny training loop: CPU default for portability; easy to switch to CUDA.
- Zero magic: Plain PyTorch, no hidden frameworks.
Prereqs: Python 3.11+, macOS/Linux, uv.
Setup (uv):
uv syncRun training:
uv run -m src.training.trainBy default the script runs on CPU and prints periodic train/val loss plus a short sample generation at the end of each epoch.
src/
data/
data_loader.py # Creates streaming token loaders for train/val
layers/
feed_forward_layer.py # Gated MLP block
grouped_query_attention.py # Multi-head attention with GQA
rms.py # RMSNorm layer
models/
block.py # Transformer block (attention + MLP + norms)
llama3.py # LLamaModel definition
tokenizer/
tokenizer.py # LlamaTokenizer (SentencePiece)
training/
loss.py # Cross‑entropy loss helpers and eval
train.py # Training loop entrypoint
utils/
tokenization.py # Generation utilities, HF login + downloads
Top-level:
pyproject.toml: package metadata and dependenciesconfig.json: holds secrets likeHF_ACCESS_TOKEN(see Security)Llama-2-7b/tokenizer.model: SentencePiece model file (vocab)verdict.txt: example text corpus (small demo dataset)
- Place your training text in a single file, e.g.
verdict.txt. - Tokenizer uses SentencePiece. A compatible LLaMA tokenizer file should be available at
Llama-2-7b/tokenizer.model.
If you need to authenticate to download the tokenizer, set a Hugging Face token (see Security) and use the utilities in src/utils/tokenization.py.
Main entrypoint: src/training/train.py
Key hyperparameters (edit inside the script):
config = {
"emb_dim": 4096,
"n_layers": 32,
"n_heads": 32,
"num_kv_heads": 8,
"hidden_dim": 11008,
"context_length": 4096,
"vocab_size": 32000,
"dtype": torch.bfloat16,
}Other knobs:
num_epochs,eval_freq,eval_iterbatch_size,stride,shufflein the data loaderstart_contextused for sample generation between epochs
Switching to CUDA:
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)The training script prints a short completion each epoch using helpers from src/utils/tokenization.py.
For ad‑hoc generation after training, you can import the same utilities and call generate_next_tokens with your LLamaModel and LlamaTokenizer.
config.jsonmay contain secrets likeHF_ACCESS_TOKEN. Do not commit real tokens publicly. Prefer environment variables in real projects.- This repo includes a placeholder for convenience only; replace it locally and keep it private.
- If you see a file-not-found error for the dataset, ensure
verdict.txtexists in the repo root and the filename intrain.pymatches exactly. - If tokenizer load fails, verify
Llama-2-7b/tokenizer.modelexists and is readable. - CPU training is slow; reduce model size or move to CUDA.
- Inspired by Meta’s LLaMA architecture and various open-source reimplementations.
- Built with PyTorch and SentencePiece.
MIT