megalodon-enwik8

Minimal example demonstrating MEGALODON outperforms Llama-style Transformers on character-level language modeling after the same number of training steps. A companion repo to megalodon-hf.

Results

Similarly-sized architectures trained on enwik8 for 1200 steps (10% of the dataset):

Model	Parameters	Val Loss @ 1100	BPC	VRAM	Time
Megalodon	11.3M	1.451	2.09	12 GB	8m 09s
Llama	12.5M	1.542	2.22	7 GB	3m 07s

Megalodon achieves 5.9% lower loss with 10% fewer parameters, but at higher compute cost (~2.6x slower, ~1.7x VRAM).

This is expected: torch lacks native support for Megalodon's complex-valued EMA operators (and megalodon-hf explicitly eschews complex-value CUDA kernels a la the upstream), while Transformers benefit from years of kernel optimization.
See RESULTS.md for experimental details and mitigation paths w.r.t. speed and memory.

This repo exists to demonstrate that Megalodon can outperform Transformers in a controlled setting, and as such focuses on correctness & readability over speed.

Quick Start

# Install
pip install -e .

# Train Megalodon (primary)
python train.py --config configs/megalodon_multichunk_512.yaml

# Train Llama baseline
python train.py --config configs/llama_512.yaml

What This Repo Is

A sanity check / MWE showing megalodon-hf works. It demonstrates that Megalodon learns to generate coherent text at small scale and beats an equivalent Transformer.

What This Repo Is NOT

A pretraining framework
A comprehensive benchmark suite
Production-ready training code

The modeling code can be found here.

Requirements

PyTorch >= 2.9.0 with bfloat16 support (Ampere+ GPU or modern CPU)
float16 is NOT supported due to numerical overflow in complex EMA

Dependencies are installed via pip install -e .

Device Support

Device	Status	Notes
NVIDIA GPU	Best	flash attention, SDPA for supported ops
Apple Silicon	Untested¹	MPS backend with autocast, template works for `llama.py`
CPU	Works	Slow; use for testing only

Override device with FORCE_DEVICE=cuda, FORCE_DEVICE=mps, or FORCE_DEVICE=cpu.

Structure

megalodon-enwik8/
├── megalodon_enwik8/    # Model implementations
│   ├── megalodon.py     # MegalodonLM wrapper
│   ├── llama.py         # Llama baseline
│   └── utils.py         # Sampling & device helpers
├── configs/
│   ├── megalodon_multichunk_512.yaml  # Primary config
│   ├── llama_512.yaml # Matched Llama baseline
│   └── test.yaml                 # Quick smoke test
├── data/
│   └── enwik8.gz        # Character-level dataset
├── train.py             # Training script
├── inference.py         # Inference utility
└── RESULTS.md           # Experimental results

License

MIT (this repo) / Apache-2.0 (megalodon-hf)

References

Megalodon architecture:

@misc{ma2024megalodon,
      title={Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length},
      author={Xuezhe Ma and Xiaomeng Yang and Wenhan Xiong and Beidi Chen and Lili Yu and Hao Zhang and Jonathan May and Luke Zettlemoyer and Omer Levy and Chunting Zhou},
      year={2024},
      eprint={2404.08801},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Llama baseline:

@misc{touvron2023llama2,
      title={Llama 2: Open Foundation and Fine-Tuned Chat Models},
      author={Hugo Touvron and others},
      year={2023},
      eprint={2307.09288},
      archivePrefix={arXiv}
}

I do not have access to Apple Silicon hardware for testing megalodon-specific training. The repo template did test llama.py on MPS and confirmed it works. Please open an issue/PR if you try this out. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data		data
megalodon_enwik8		megalodon_enwik8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RESULTS.md		RESULTS.md
inference.py		inference.py
pyproject.toml		pyproject.toml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

megalodon-enwik8

Results

Quick Start

What This Repo Is

What This Repo Is NOT

Requirements

Device Support

Structure

License

References

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

megalodon-enwik8

Results

Quick Start

What This Repo Is

What This Repo Is NOT

Requirements

Device Support

Structure

License

References

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages