GitHub - vivek12345/llama-with-gqa-and-rope: Implementation of llama models with GQA and RoPE

LLaMA 3.1 — From Scratch (PyTorch)

Train a minimal, readable LLaMA-style transformer end‑to‑end in PyTorch. This repo focuses on clarity over cleverness: straightforward modules, explicit configs, and a tiny training loop you can understand in one sitting.

Highlights

Clean architecture: LLamaModel with TransformerBlock, GroupedQueryAttention, RMSNorm, and gated MLP.
Tokenization: SentencePiece-based tokenizer compatible with LLaMA vocab files.
Tiny training loop: CPU default for portability; easy to switch to CUDA.
Zero magic: Plain PyTorch, no hidden frameworks.

Quickstart

Prereqs: Python 3.11+, macOS/Linux, uv.

Setup (uv):

uv sync

Run training:

uv run -m src.training.train

By default the script runs on CPU and prints periodic train/val loss plus a short sample generation at the end of each epoch.

Project Structure

src/
  data/
    data_loader.py           # Creates streaming token loaders for train/val
  layers/
    feed_forward_layer.py    # Gated MLP block
    grouped_query_attention.py # Multi-head attention with GQA
    rms.py                   # RMSNorm layer
  models/
    block.py                 # Transformer block (attention + MLP + norms)
    llama3.py                # LLamaModel definition
  tokenizer/
    tokenizer.py             # LlamaTokenizer (SentencePiece)
  training/
    loss.py                  # Cross‑entropy loss helpers and eval
    train.py                 # Training loop entrypoint
  utils/
    tokenization.py          # Generation utilities, HF login + downloads

Top-level:

pyproject.toml: package metadata and dependencies
config.json: holds secrets like HF_ACCESS_TOKEN (see Security)
Llama-2-7b/tokenizer.model: SentencePiece model file (vocab)
verdict.txt: example text corpus (small demo dataset)

Data & Tokenizer

Place your training text in a single file, e.g. verdict.txt.
Tokenizer uses SentencePiece. A compatible LLaMA tokenizer file should be available at Llama-2-7b/tokenizer.model.

If you need to authenticate to download the tokenizer, set a Hugging Face token (see Security) and use the utilities in src/utils/tokenization.py.

Training

Main entrypoint: src/training/train.py

Key hyperparameters (edit inside the script):

config = {
    "emb_dim": 4096,
    "n_layers": 32,
    "n_heads": 32,
    "num_kv_heads": 8,
    "hidden_dim": 11008,
    "context_length": 4096,
    "vocab_size": 32000,
    "dtype": torch.bfloat16,
}

Other knobs:

num_epochs, eval_freq, eval_iter
batch_size, stride, shuffle in the data loader
start_context used for sample generation between epochs

Switching to CUDA:

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

Inference (toy example)

The training script prints a short completion each epoch using helpers from src/utils/tokenization.py.

For ad‑hoc generation after training, you can import the same utilities and call generate_next_tokens with your LLamaModel and LlamaTokenizer.

Security

config.json may contain secrets like HF_ACCESS_TOKEN. Do not commit real tokens publicly. Prefer environment variables in real projects.
This repo includes a placeholder for convenience only; replace it locally and keep it private.

Troubleshooting

If you see a file-not-found error for the dataset, ensure verdict.txt exists in the repo root and the filename in train.py matches exactly.
If tokenizer load fails, verify Llama-2-7b/tokenizer.model exists and is readable.
CPU training is slow; reduce model size or move to CUDA.

Acknowledgements

Inspired by Meta’s LLaMA architecture and various open-source reimplementations.
Built with PyTorch and SentencePiece.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.json		config.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaMA 3.1 — From Scratch (PyTorch)

Highlights

Quickstart

Project Structure

Data & Tokenizer

Training

Inference (toy example)

Security

Troubleshooting

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLaMA 3.1 — From Scratch (PyTorch)

Highlights

Quickstart

Project Structure

Data & Tokenizer

Training

Inference (toy example)

Security

Troubleshooting

Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages