🧠 Neutro: The "Old School" Deep Learning Playground

Neutro is a intentionally naive, NumPy-only implementation of modern deep learning architectures. It’s the Keras experience you love, powered by the NumPy you tolerate, built specifically for people who want to peek under the hood and actually understand how the gears turn.

👴 The Philosophy: Why Does This Exist?

Let's be honest: modern DL frameworks are black boxes. You pip install 4GB of binaries and suddenly you're "doing AI."

Neutro is for the curious, the learners, and the "old-school" folks like me who believe that if you can't build it in a matrix, you don't really know it.

Learn, Don't just Run: Every line of code is designed to be readable. We don't hide behind C++ kernels or CUDA kernels. If you want to know how FlashAttention actually tiles memory, you can just read the Python file.
A Toy, not a Tool: This isn't meant for production. It's a playground for learning advanced algorithms (MHA, GQA, FlashAttention, LSTM) in their purest form.
For the Wisdom-Rich: If you remember when 64MB of RAM was a flex and "vectorization" meant loop unrolling, this is for you. It's a fun way to play with cutting-edge 2024 algorithms using 1990s-era clarity.

🔄 Autograd — From Scratch, In NumPy

Unlike TensorFlow or JAX, neutro doesn't import a third-party autograd engine — it is its own autograd engine.

We built neutro.autograd from scratch in pure NumPy: a Tensor wrapper with a GradientTape that records every operation and computes gradients via reverse-mode AD. Ops like softmax, matmul, and relu all register backward closures on the tape.

Two ways to learn backprop:

The Tape (default): Most layers use base.Layer.backward() which re-runs the forward pass inside a GradientTape and lets the tape compute all gradients automatically. This is clean, composable, and easy to maintain.
The Manual Way: Any layer can override backward(grad_output) and compute gradients with explicit chain rule math. This is still supported and used by a few layers where the educational value of hand-writing the gradient is highest.

Both paths teach you something. The tape shows you how autograd engines work internally (docs/autograd/). The manual path shows you the actual chain rule math for each layer.

🚀 What's Inside?

"I can't believe it's not Keras!": Your muscle memory is safe here. .compile(), .fit(), .predict()—it’s all exactly where you left it.
Pure NumPy Math: We did the math so you don't have to. Every gradient, from Softmax to LSTM gates, is hand-derived and vectorized.
Speed (for a CPU): We use im2col for convolutions and FlashAttention (yes, really) to keep your CPU fans humming in a way that sounds productive.
Zero Heavy Dependencies: Tired of downloading 4GB of CUDA binaries just to train on MNIST? We require exactly numpy and scipy. That’s it.

🛠 Features That'll Make You Say "Wait, You Implemented That?"

Category	The "Fancy" Stuff	Why You Should Care
Attention	`FlashAttention`, `MQA`, `GQA`, `RoPE`	We have more attention variants than a distracted toddler.
Tokenization	`BPETokenizer`, `RegexTokenizer`	Byte-level BPE with regex splitting, just like the big kids.
Vision	`AlexNet`, `VGG16`, `VGG19`, `im2col`	Classical and modern vision architectures, vectorized.
LLMs	`Llama`, `Qwen`, `DeepSeek` (MoE)	Yes, you can run a (very tiny) MoE model on your CPU.
Modern Ops	`RMSNorm`, `SiLU`, `SwiGLU`	The secret sauce of modern LLMs, hand-implemented.
Optimizers	`AdamW`, `Adam`, `SGD+Momentum`	Keep your weights from exploding like a bad science fair project.

🏆 The Hall of Fame: Pre-built Architectures

Why build from scratch when we've already done the heavy lifting?

The Visionaries: AlexNet, VGG16, VGG19
The Linguists: GPT-2, LlamaTiny, QwenTiny, DeepSeekTiny (Mixture of Experts)

💻 Show Me The Code!

If you know Keras, you already know Neutro. It's that simple.

from neutro.models import Sequential
from neutro.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Build a CNN that actually fits in your head
model = Sequential([
    Conv2D(32, kernel_size=3, activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=2),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# Compile it like it's 2015
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit it like a tailored suit
model.fit(train_flow, epochs=10)

📚 Documentation

Every component is documented with line-by-line walkthroughs, math, and references to the original research papers.

👉 Browse the full documentation →

🧠 Deep Dives & Nerdy Stuff

We documented everything because we know you like to check the math:

Documentation Home — Start here for the full index with research paper links.
Attention Mechanisms — How we made FlashAttention work on a CPU.
Convolutional Magic — The im2col deep dive.
Activations & Gradients — Proofs for the brave.
Autograd Engine — How we do automatic differentiation.
Optimizers — Why AdamW is better than your ex.

🧪 Examples to Flex Your CPU

Check out the examples/ folder for end-to-end scripts:

mnist_cnn.py: Standard digit classification with real-time augmentation.
wikitext_llm.py: A character-level Transformer that actually talks back.

🏗 Installation

git clone https://github.com/sourcepirate/neutro.git
cd neutro
pip install -e .

Disclaimer: This is a hobby project for learning and exploration. It is intentionally naive, likely inefficient compared to compiled kernels, and 100% focused on the joy of understanding advanced algorithms. If you're looking to change the world with AGI, go to PyTorch. If you're looking to understand why your Transformer works while drinking a nice cup of tea, you're in the right place.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
neutro		neutro
tests		tests
.gitignore		.gitignore
Agents.md		Agents.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Neutro: The "Old School" Deep Learning Playground

👴 The Philosophy: Why Does This Exist?

🔄 Autograd — From Scratch, In NumPy

Two ways to learn backprop:

🚀 What's Inside?

🛠 Features That'll Make You Say "Wait, You Implemented That?"

🏆 The Hall of Fame: Pre-built Architectures

💻 Show Me The Code!

📚 Documentation

🧠 Deep Dives & Nerdy Stuff

🧪 Examples to Flex Your CPU

🏗 Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Neutro: The "Old School" Deep Learning Playground

👴 The Philosophy: Why Does This Exist?

🔄 Autograd — From Scratch, In NumPy

Two ways to learn backprop:

🚀 What's Inside?

🛠 Features That'll Make You Say "Wait, You Implemented That?"

🏆 The Hall of Fame: Pre-built Architectures

💻 Show Me The Code!

📚 Documentation

🧠 Deep Dives & Nerdy Stuff

🧪 Examples to Flex Your CPU

🏗 Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages