Skip to content

vivek12345/llama-with-gqa-and-rope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLaMA 3.1 — From Scratch (PyTorch)

Train a minimal, readable LLaMA-style transformer end‑to‑end in PyTorch. This repo focuses on clarity over cleverness: straightforward modules, explicit configs, and a tiny training loop you can understand in one sitting.

Highlights

  • Clean architecture: LLamaModel with TransformerBlock, GroupedQueryAttention, RMSNorm, and gated MLP.
  • Tokenization: SentencePiece-based tokenizer compatible with LLaMA vocab files.
  • Tiny training loop: CPU default for portability; easy to switch to CUDA.
  • Zero magic: Plain PyTorch, no hidden frameworks.

Quickstart

Prereqs: Python 3.11+, macOS/Linux, uv.

Setup (uv):

uv sync

Run training:

uv run -m src.training.train

By default the script runs on CPU and prints periodic train/val loss plus a short sample generation at the end of each epoch.


Project Structure

src/
  data/
    data_loader.py           # Creates streaming token loaders for train/val
  layers/
    feed_forward_layer.py    # Gated MLP block
    grouped_query_attention.py # Multi-head attention with GQA
    rms.py                   # RMSNorm layer
  models/
    block.py                 # Transformer block (attention + MLP + norms)
    llama3.py                # LLamaModel definition
  tokenizer/
    tokenizer.py             # LlamaTokenizer (SentencePiece)
  training/
    loss.py                  # Cross‑entropy loss helpers and eval
    train.py                 # Training loop entrypoint
  utils/
    tokenization.py          # Generation utilities, HF login + downloads

Top-level:

  • pyproject.toml: package metadata and dependencies
  • config.json: holds secrets like HF_ACCESS_TOKEN (see Security)
  • Llama-2-7b/tokenizer.model: SentencePiece model file (vocab)
  • verdict.txt: example text corpus (small demo dataset)

Data & Tokenizer

  • Place your training text in a single file, e.g. verdict.txt.
  • Tokenizer uses SentencePiece. A compatible LLaMA tokenizer file should be available at Llama-2-7b/tokenizer.model.

If you need to authenticate to download the tokenizer, set a Hugging Face token (see Security) and use the utilities in src/utils/tokenization.py.


Training

Main entrypoint: src/training/train.py

Key hyperparameters (edit inside the script):

config = {
    "emb_dim": 4096,
    "n_layers": 32,
    "n_heads": 32,
    "num_kv_heads": 8,
    "hidden_dim": 11008,
    "context_length": 4096,
    "vocab_size": 32000,
    "dtype": torch.bfloat16,
}

Other knobs:

  • num_epochs, eval_freq, eval_iter
  • batch_size, stride, shuffle in the data loader
  • start_context used for sample generation between epochs

Switching to CUDA:

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

Inference (toy example)

The training script prints a short completion each epoch using helpers from src/utils/tokenization.py.

For ad‑hoc generation after training, you can import the same utilities and call generate_next_tokens with your LLamaModel and LlamaTokenizer.


Security

  • config.json may contain secrets like HF_ACCESS_TOKEN. Do not commit real tokens publicly. Prefer environment variables in real projects.
  • This repo includes a placeholder for convenience only; replace it locally and keep it private.

Troubleshooting

  • If you see a file-not-found error for the dataset, ensure verdict.txt exists in the repo root and the filename in train.py matches exactly.
  • If tokenizer load fails, verify Llama-2-7b/tokenizer.model exists and is readable.
  • CPU training is slow; reduce model size or move to CUDA.

Acknowledgements

  • Inspired by Meta’s LLaMA architecture and various open-source reimplementations.
  • Built with PyTorch and SentencePiece.

License

MIT

About

Implementation of llama models with GQA and RoPE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages