🧠 Micrograd — From-Scratch Autograd Engine & Neural Network

A scalar-valued autograd engine and neural network library built entirely from scratch in Python — no PyTorch, no TensorFlow, just pure math.

📌 What is Micrograd?

Micrograd implements backpropagation (reverse-mode autodiff) over a dynamically built Directed Acyclic Graph (DAG) of scalar values. It supports enough operations to build and train small neural networks — demonstrating the exact same core concepts that power frameworks like PyTorch under the hood.

Core Components

Component	Description
`Value`	Wraps a scalar with gradient tracking, graph connectivity, and a local `_backward` function
`Neuron`	`tanh(w · x + b)` — a single unit with learnable weights and bias
`Layer`	A collection of `Neuron` objects operating in parallel
`MLP`	A Multi-Layer Perceptron — layers chained sequentially

🔁 How Backpropagation Works

Every arithmetic operation (+, *, **, tanh, exp) builds a computational graph. Calling .backward() on the output traverses this graph in reverse topological order, applying the chain rule at each node to compute dL/d(node) for every node.

Example 1 — Manual Gradient Computation

A simple expression: L = (a * b + c) * f

a = Value(2.0);   b = Value(-3.0)
c = Value(10.0);  f = Value(-2.0)

e = a * b          # -6.0
d = e + c          #  4.0
L = d * f          # -8.0

Gradients computed manually using the chain rule:

Node	Formula	Gradient
`L`	dL/dL = 1	1.0
`f`	dL/df = d	4.0
`d`	dL/dd = f	-2.0
`c`	dL/dd · dd/dc = -2.0 × 1.0	-2.0
`e`	dL/dd · dd/de = -2.0 × 1.0	-2.0
`b`	dL/de · de/db = -2.0 × 2.0	-4.0
`a`	dL/de · de/da = -2.0 × (-3.0)	6.0

Computational graph for Example 1 showing data and grad at each node

Example 2 — Single Neuron with Manual Backprop

Models a single neuron: o = tanh(x1·w1 + x2·w2 + b)

x1, x2 = Value(2.0), Value(0.0)      # inputs
w1, w2 = Value(-3.0), Value(1.0)      # weights
b      = Value(6.8814)                # bias

n = x1*w1 + x2*w2 + b                # 0.8814
o = n.tanh()                          # 0.7071

The tanh derivative: do/dn = 1 - tanh(n)² = 1 - 0.7071² = 0.5

Computational graph for a single neuron with tanh activation

Example 3 — Automatic Backprop

Same neuron as above, but using the automatic backward() method:

o.backward()  # topological sort + reverse traversal — does everything!

This replaces all the manual gradient assignments. Results are identical to Example 2.

Example 4 — Decomposing tanh into Primitives

Proves that tanh can be broken into its raw mathematical operations and backprop still works correctly:

# Instead of o = n.tanh():
e = (2 * n).exp()         # e^(2n)
o = (e - 1) / (e + 1)     # manual tanh formula
o.backward()               # all gradients match!

tanh decomposed into exp, subtraction, addition, and division — gradients propagate correctly through all primitives

Key Insight: You can compose any differentiable operations and the autograd engine will figure out all the gradients automatically.

🏗️ Neural Network Architecture

Multi-Layer Perceptron: inputs (blue) → hidden layers → outputs (green)

Building Blocks

class Neuron:
    """Single neuron: out = tanh(Σ(wi * xi) + b)"""
    def __init__(self, nin):
        self.w = [Value(random.uniform(-1, 1)) for _ in range(nin)]
        self.b = Value(random.uniform(-1, 1))

class Layer:
    """Collection of neurons operating in parallel"""
    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]

class MLP:
    """Multi-Layer Perceptron — layers chained sequentially"""
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]

In the notebook: MLP(3, [4, 4, 1]) creates a network with 41 parameters:

Layer	Shape	Parameters
Hidden 1	3 → 4	3×4 + 4 = 16
Hidden 2	4 → 4	4×4 + 4 = 20
Output	4 → 1	4×1 + 1 = 5

🔄 Training Loop

The training process follows a four-step cycle repeated for K iterations:

┌─────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  1. Forward  │ ──▶ │ 2. Zero Grad │ ──▶ │ 3. Backward  │ ──▶ │  4. Update   │
│     Pass     │     │              │     │     Pass     │     │   Weights    │
└─────────────┘     └──────────────┘     └──────────────┘     └──────────────┘
       ▲                                                              │
       └──────────────────────────────────────────────────────────────┘

for k in range(20):
    # 1. Forward pass
    ypred = [n(x) for x in xs]
    loss = sum(((yout - ygt) ** 2 for ygt, yout in zip(ys, ypred)), Value(0.0))

    # 2. Zero gradients
    for p in n.parameters():
        p.grad = 0.0

    # 3. Backward pass
    loss.backward()

    # 4. Update weights (gradient descent)
    for p in n.parameters():
        p.data += -0.5 * p.grad  # learning rate = 0.5

Loss function: Mean Squared Error → L = Σ(y_pred - y_true)²
Update rule: w = w - lr × dL/dw

🔗 Mapping Micrograd → PyTorch

Micrograd	PyTorch Equivalent
`Value`	`torch.Tensor` (with `requires_grad=True`)
`.backward()`	`loss.backward()`
`p.data += -lr * p.grad`	`optimizer.step()`
`p.grad = 0.0`	`optimizer.zero_grad()`

The only real difference: PyTorch operates on tensors (batched multi-dimensional arrays) instead of scalars, for GPU-accelerated efficiency.

📂 Repository Structure

micrograd/
├── micrograd.ipynb       # Full implementation notebook
├── README.md
└── images/
    ├── img1.png          # Example 1 — expression graph
    ├── img2.png          # Example 2 — single neuron graph
    ├── img3.png          # Example 4 — decomposed tanh graph
    └── img4.png          # MLP architecture diagram

🚀 Getting Started

git clone https://github.com/sherurox/micrograd.git
cd micrograd
pip install graphviz numpy matplotlib
jupyter notebook micrograd.ipynb

Note: You need Graphviz installed on your system for the draw_dot visualization to work.

📚 References

👤 Author

Shreyas Khandale
MS Computer Science (AI Track) — Binghamton University
GitHub · Email

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
README.md		README.md
micrograd.ipynb		micrograd.ipynb
micrograd_revision_notes.pdf		micrograd_revision_notes.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Micrograd — From-Scratch Autograd Engine & Neural Network

A scalar-valued autograd engine and neural network library built entirely from scratch in Python — no PyTorch, no TensorFlow, just pure math.

📌 What is Micrograd?

Core Components

🔁 How Backpropagation Works

Example 1 — Manual Gradient Computation

Example 2 — Single Neuron with Manual Backprop

Example 3 — Automatic Backprop

Example 4 — Decomposing tanh into Primitives

🏗️ Neural Network Architecture

Building Blocks

🔄 Training Loop

🔗 Mapping Micrograd → PyTorch

📂 Repository Structure

🚀 Getting Started

📚 References

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Micrograd — From-Scratch Autograd Engine & Neural Network

A scalar-valued autograd engine and neural network library built entirely from scratch in Python — no PyTorch, no TensorFlow, just pure math.

📌 What is Micrograd?

Core Components

🔁 How Backpropagation Works

Example 1 — Manual Gradient Computation

Example 2 — Single Neuron with Manual Backprop

Example 3 — Automatic Backprop

Example 4 — Decomposing tanh into Primitives

🏗️ Neural Network Architecture

Building Blocks

🔄 Training Loop

🔗 Mapping Micrograd → PyTorch

📂 Repository Structure

🚀 Getting Started

📚 References

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages