Skip to content

sherurox/micrograd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Micrograd — From-Scratch Autograd Engine & Neural Network

A scalar-valued autograd engine and neural network library built entirely from scratch in Python — no PyTorch, no TensorFlow, just pure math.

📌 What is Micrograd?

Micrograd implements backpropagation (reverse-mode autodiff) over a dynamically built Directed Acyclic Graph (DAG) of scalar values. It supports enough operations to build and train small neural networks — demonstrating the exact same core concepts that power frameworks like PyTorch under the hood.

Core Components

Component Description
Value Wraps a scalar with gradient tracking, graph connectivity, and a local _backward function
Neuron tanh(w · x + b) — a single unit with learnable weights and bias
Layer A collection of Neuron objects operating in parallel
MLP A Multi-Layer Perceptron — layers chained sequentially

🔁 How Backpropagation Works

Every arithmetic operation (+, *, **, tanh, exp) builds a computational graph. Calling .backward() on the output traverses this graph in reverse topological order, applying the chain rule at each node to compute dL/d(node) for every node.

Example 1 — Manual Gradient Computation

A simple expression: L = (a * b + c) * f

a = Value(2.0);   b = Value(-3.0)
c = Value(10.0);  f = Value(-2.0)

e = a * b          # -6.0
d = e + c          #  4.0
L = d * f          # -8.0

Gradients computed manually using the chain rule:

Node Formula Gradient
L dL/dL = 1 1.0
f dL/df = d 4.0
d dL/dd = f -2.0
c dL/dd · dd/dc = -2.0 × 1.0 -2.0
e dL/dd · dd/de = -2.0 × 1.0 -2.0
b dL/de · de/db = -2.0 × 2.0 -4.0
a dL/de · de/da = -2.0 × (-3.0) 6.0

Example 1 — Computational Graph
Computational graph for Example 1 showing data and grad at each node


Example 2 — Single Neuron with Manual Backprop

Models a single neuron: o = tanh(x1·w1 + x2·w2 + b)

x1, x2 = Value(2.0), Value(0.0)      # inputs
w1, w2 = Value(-3.0), Value(1.0)      # weights
b      = Value(6.8814)                # bias

n = x1*w1 + x2*w2 + b                # 0.8814
o = n.tanh()                          # 0.7071

The tanh derivative: do/dn = 1 - tanh(n)² = 1 - 0.7071² = 0.5

Example 2 — Single Neuron Graph
Computational graph for a single neuron with tanh activation


Example 3 — Automatic Backprop

Same neuron as above, but using the automatic backward() method:

o.backward()  # topological sort + reverse traversal — does everything!

This replaces all the manual gradient assignments. Results are identical to Example 2.


Example 4 — Decomposing tanh into Primitives

Proves that tanh can be broken into its raw mathematical operations and backprop still works correctly:

# Instead of o = n.tanh():
e = (2 * n).exp()         # e^(2n)
o = (e - 1) / (e + 1)     # manual tanh formula
o.backward()               # all gradients match!

Example 4 — Decomposed tanh
tanh decomposed into exp, subtraction, addition, and division — gradients propagate correctly through all primitives

Key Insight: You can compose any differentiable operations and the autograd engine will figure out all the gradients automatically.


🏗️ Neural Network Architecture

MLP Architecture
Multi-Layer Perceptron: inputs (blue) → hidden layers → outputs (green)

Building Blocks

class Neuron:
    """Single neuron: out = tanh(Σ(wi * xi) + b)"""
    def __init__(self, nin):
        self.w = [Value(random.uniform(-1, 1)) for _ in range(nin)]
        self.b = Value(random.uniform(-1, 1))

class Layer:
    """Collection of neurons operating in parallel"""
    def __init__(self, nin, nout):
        self.neurons = [Neuron(nin) for _ in range(nout)]

class MLP:
    """Multi-Layer Perceptron — layers chained sequentially"""
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = [Layer(sz[i], sz[i+1]) for i in range(len(nouts))]

In the notebook: MLP(3, [4, 4, 1]) creates a network with 41 parameters:

Layer Shape Parameters
Hidden 1 3 → 4 3×4 + 4 = 16
Hidden 2 4 → 4 4×4 + 4 = 20
Output 4 → 1 4×1 + 1 = 5

🔄 Training Loop

The training process follows a four-step cycle repeated for K iterations:

┌─────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  1. Forward  │ ──▶ │ 2. Zero Grad │ ──▶ │ 3. Backward  │ ──▶ │  4. Update   │
│     Pass     │     │              │     │     Pass     │     │   Weights    │
└─────────────┘     └──────────────┘     └──────────────┘     └──────────────┘
       ▲                                                              │
       └──────────────────────────────────────────────────────────────┘
for k in range(20):
    # 1. Forward pass
    ypred = [n(x) for x in xs]
    loss = sum(((yout - ygt) ** 2 for ygt, yout in zip(ys, ypred)), Value(0.0))

    # 2. Zero gradients
    for p in n.parameters():
        p.grad = 0.0

    # 3. Backward pass
    loss.backward()

    # 4. Update weights (gradient descent)
    for p in n.parameters():
        p.data += -0.5 * p.grad  # learning rate = 0.5

Loss function: Mean Squared Error → L = Σ(y_pred - y_true)²
Update rule: w = w - lr × dL/dw


🔗 Mapping Micrograd → PyTorch

Micrograd PyTorch Equivalent
Value torch.Tensor (with requires_grad=True)
.backward() loss.backward()
p.data += -lr * p.grad optimizer.step()
p.grad = 0.0 optimizer.zero_grad()

The only real difference: PyTorch operates on tensors (batched multi-dimensional arrays) instead of scalars, for GPU-accelerated efficiency.


📂 Repository Structure

micrograd/
├── micrograd.ipynb       # Full implementation notebook
├── README.md
└── images/
    ├── img1.png          # Example 1 — expression graph
    ├── img2.png          # Example 2 — single neuron graph
    ├── img3.png          # Example 4 — decomposed tanh graph
    └── img4.png          # MLP architecture diagram

🚀 Getting Started

git clone https://github.com/sherurox/micrograd.git
cd micrograd
pip install graphviz numpy matplotlib
jupyter notebook micrograd.ipynb

Note: You need Graphviz installed on your system for the draw_dot visualization to work.


📚 References


👤 Author

Shreyas Khandale
MS Computer Science (AI Track) — Binghamton University
GitHub · Email

About

A scalar-valued autograd engine and neural network library built entirely from scratch in Python — no PyTorch, no TensorFlow, just pure math.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors