Nano-Llama: 110M Parameter Language Model

1. Introduction

Nano-Llama is a custom-built, 110-million parameter language model built entirely from scratch using PyTorch. Designed with modern Llama architecture (Rotary Positional Embeddings, RMSNorm, SwiGLU, and FlashAttention), the model was specifically engineered and optimized to train efficiently on a highly constrained 4GB VRAM consumer GPU. The model was trained on the TinyStories dataset to generate coherent, grammatically correct English narratives.

2. Project Metrics

Dataset & Corpus

Dataset: TinyStories
Tokenizer: Llama-2 Tokenizer (Vocab Size: 32,000)
Train Tokens: 516,514,371
Validation Tokens: 5,195,911
Total Training Corpus: 521,710,282 tokens (1 Full Epoch)

Hardware & Performance

GPU: NVIDIA GeForce RTX 3050 (4GB VRAM)
Training Time: ~13.5 hours
Peak Memory Usage: ~3.07 GB (75% VRAM utilization)
Throughput: ~10,600 Tokens Per Second (TPS) on average.
Optimization Techniques: Bfloat16 precision, Gradient Accumulation (Global Batch Size of 64), Asynchronous CUDA-stream data prefetching, and Fused AdamW.

Training Results

Total Steps: 15,762
Initial Loss: 682.0000 (Step 0)
Final Training Loss: 1.6514 (Step 15,762)
Learning Rate Schedule: Cosine decay with warmup (Max: 5e-4, Min: 5e-5)
Convergence: The model showed rapid learning, dropping below a loss of 10.0 by step 600, and smoothly converging to ~1.65 by the end of the epoch, indicating a strong grasp of the dataset's vocabulary and grammar.

3. Graphs

(Below is the loss curve and learning rate schedule over the 15.7k training steps)

4. Sample Generation

Here are 3 examples of the model generating text from a given prompt after completing its training.

Example 1:

Input: "Once upon a time, there was a little dog named"
Output: "Once upon a time, there was a little dog named Max. Max loved to play in the park with his best friend, a little girl named Lily. One day, Lily and Max went to the park to play. Max saw a big ball and ran to play with it. They played and played until it was time to go home.Suddenly, Lily heard a loud noise. It was an ambulance with its lights flashing. Max and Lily were scared, but they knew they had to be brave. They watched as the ambulance drove away. Max and Lily held hands and walked home safely.Max was so happy to have helped his friend. He knew that when he saw something was wrong, he could always ask for help and make friends. From that day on, Max and Lily looked out for each other and remembered that they had a secret to help them. Max knew that even though the ambulance wasight, he had learned an important lesson about helping others."

Example 2:

Input: "The big red ball rolled down the"
Output: "The big red ball rolled down the hill. Lily was surprised and happy. She said, "Hi, I'm Lily. Do you want to play with me?" The big red ball replied, "Yes, I want to play with you. Thank you, Lily." Lily and the big red ball played together on the grass. They laughed and had a lot of fun. When it was time to go home, Lily said, "Bye, big red ball. See you tomorrow." The big red ball said, "Bye, Lily!" and waved. Lily waved goodbye to the big red ball and went home."

Example 3:

Input: "Lily wanted to play outside, but it started to"
Output: "Lily wanted to play outside, but it started to rain. Lily was scared of the rain, so she ran back inside.When she got inside, she saw her mommy and daddy sitting on the couch. They were watching TV together and having fun. Lily felt happy and safe with her mommy and daddy."

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
graphs		graphs
model		model
scripts		scripts
tokenizer		tokenizer
train_out_dir		train_out_dir
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nano-Llama: 110M Parameter Language Model

1. Introduction

2. Project Metrics

Dataset & Corpus

Hardware & Performance

Training Results

3. Graphs

4. Sample Generation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nano-Llama: 110M Parameter Language Model

1. Introduction

2. Project Metrics

Dataset & Corpus

Hardware & Performance

Training Results

3. Graphs

4. Sample Generation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages