Nano-Llama is a custom-built, 110-million parameter language model built entirely from scratch using PyTorch. Designed with modern Llama architecture (Rotary Positional Embeddings, RMSNorm, SwiGLU, and FlashAttention), the model was specifically engineered and optimized to train efficiently on a highly constrained 4GB VRAM consumer GPU. The model was trained on the TinyStories dataset to generate coherent, grammatically correct English narratives.
- Dataset: TinyStories
- Tokenizer: Llama-2 Tokenizer (Vocab Size: 32,000)
- Train Tokens: 516,514,371
- Validation Tokens: 5,195,911
- Total Training Corpus: 521,710,282 tokens (1 Full Epoch)
- GPU: NVIDIA GeForce RTX 3050 (4GB VRAM)
- Training Time: ~13.5 hours
- Peak Memory Usage: ~3.07 GB (75% VRAM utilization)
- Throughput: ~10,600 Tokens Per Second (TPS) on average.
- Optimization Techniques: Bfloat16 precision, Gradient Accumulation (Global Batch Size of 64), Asynchronous CUDA-stream data prefetching, and Fused AdamW.
- Total Steps: 15,762
- Initial Loss: 682.0000 (Step 0)
- Final Training Loss: 1.6514 (Step 15,762)
- Learning Rate Schedule: Cosine decay with warmup (Max:
5e-4, Min:5e-5) - Convergence: The model showed rapid learning, dropping below a loss of 10.0 by step 600, and smoothly converging to ~1.65 by the end of the epoch, indicating a strong grasp of the dataset's vocabulary and grammar.
(Below is the loss curve and learning rate schedule over the 15.7k training steps)

Here are 3 examples of the model generating text from a given prompt after completing its training.
Example 1:
- Input: "Once upon a time, there was a little dog named"
- Output: "Once upon a time, there was a little dog named Max. Max loved to play in the park with his best friend, a little girl named Lily. One day, Lily and Max went to the park to play. Max saw a big ball and ran to play with it. They played and played until it was time to go home.Suddenly, Lily heard a loud noise. It was an ambulance with its lights flashing. Max and Lily were scared, but they knew they had to be brave. They watched as the ambulance drove away. Max and Lily held hands and walked home safely.Max was so happy to have helped his friend. He knew that when he saw something was wrong, he could always ask for help and make friends. From that day on, Max and Lily looked out for each other and remembered that they had a secret to help them. Max knew that even though the ambulance wasight, he had learned an important lesson about helping others."
Example 2:
- Input: "The big red ball rolled down the"
- Output: "The big red ball rolled down the hill. Lily was surprised and happy. She said, "Hi, I'm Lily. Do you want to play with me?" The big red ball replied, "Yes, I want to play with you. Thank you, Lily." Lily and the big red ball played together on the grass. They laughed and had a lot of fun. When it was time to go home, Lily said, "Bye, big red ball. See you tomorrow." The big red ball said, "Bye, Lily!" and waved. Lily waved goodbye to the big red ball and went home."
Example 3:
- Input: "Lily wanted to play outside, but it started to"
- Output: "Lily wanted to play outside, but it started to rain. Lily was scared of the rain, so she ran back inside.When she got inside, she saw her mommy and daddy sitting on the couch. They were watching TV together and having fun. Lily felt happy and safe with her mommy and daddy."