This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
npm run dev # Start Vite dev serversrc/
├── GPUEnv.ts # WebGPU device singleton
├── main.ts # Entry point
├── Trainer.ts # Training loop orchestrator
├── Tester.ts # Test/evaluation harness
├── model/
│ └── Datasource.ts # Datasource interface
├── MNIST/
│ ├── MNIST.ts # MNIST model definition
│ └── MNISTDatasource.ts # MNIST data loading
├── layer/
│ ├── Layer.ts # Base layer interface
│ ├── Linear.ts # Fully connected layer
│ ├── Sequential.ts # Layer container
│ ├── ReLU.ts # ReLU activation layer
│ └── Dropout.ts # Dropout regularization layer
├── math/
│ └── Utils.ts # Weight initializers (heNormal, heUniform)
├── autograd/
│ ├── GradientFunction.ts # Interface for backward functions
│ ├── BackwardPass.ts # Reverse topological sort for backprop
│ └── backward/
│ ├── MatMulBackward.ts
│ ├── BiasAddBackward.ts
│ ├── ReLUBackward.ts
│ ├── DropoutBackward.ts
│ ├── MatAddBackward.ts
│ ├── SoftmaxBackward.ts
│ └── SoftmaxCrossEntropyBackward.ts
├── optimizer/
│ ├── Optimizer.ts # Optimizer interface with LR scheduling
│ └── SGD.ts # SGD with learning rate scheduling
└── tensor/
├── Tensor.ts # GPU-backed tensor with autograd support
├── TensorManager.ts # Buffer lifecycle management
└── kernel/
├── Kernel.ts # Base kernel class
├── KernelRegistry.ts # Kernel instantiation
├── MatMulKernel.ts
├── MatAddKernel.ts
├── BiasAddKernel.ts
├── RELUKernel.ts
├── DropoutKernel.ts
├── SoftmaxKernel.ts
├── CrossEntropyKernel.ts
├── TransposeKernel.ts # Autograd support
├── SumReduceKernel.ts # Autograd support
├── ElementwiseMulKernel.ts # Autograd support
├── ReLUBackwardKernel.ts # Autograd support
├── SoftmaxBackwardKernel.ts # Autograd support
├── SoftmaxCEBackwardKernel.ts # Autograd support
├── ScalarMulKernel.ts # Optimizer support
├── InplaceAddKernel.ts # Optimizer support
└── SumAllKernel.ts # Optimizer support
WebGPU-based tensor computation library for MNIST, built with TypeScript and Vite.
-
GPUEnv (
src/GPUEnv.ts): Singleton that initializes WebGPU device. Throws on failure. Must callawait GPUEnv.init()before usingGPUEnv.device. -
Tensor (
src/tensor/Tensor.ts): WrapsGPUBufferwith shape metadata. Row-major storage. Size computed viashape.reduce((a, b) => a * b, 1). -
TensorManager (
src/tensor/TensorManager.ts): Manages GPU buffer lifecycle. Handles buffer reuse, 256-byte alignment, readback buffers, and deferred destruction. UsegetTensorBuffer()to create/reuse tensors by name. -
KernelRegistry (
src/tensor/kernel/KernelRegistry.ts): Central registry for GPU kernels. Instantiates all kernels with shared device and TensorManager.
All kernels extend Kernel base class (src/tensor/kernel/Kernel.ts) which creates the shader module and compute pipeline.
Kernels follow a functional pattern—they return tensors, enabling composition:
const out = kernelRegistry.relu.run(
kernelRegistry.matmul.run(t0, t1)
);Each kernel:
- Validates input tensor shapes (must be 2D)
- Auto-creates output buffer if not provided (via TensorManager)
- Writes params to uniform buffer
- Creates bind group, dispatches compute, submits to queue
- Returns output tensor
Forward Kernels:
- MatMulKernel: Tiled 16×16 matrix multiplication with shared memory
- MatAddKernel: Element-wise addition of two 2D tensors with matching shapes
- BiasAddKernel: Broadcasts 1D bias [N] across rows of 2D input [M,N]
- RELUKernel: Element-wise ReLU activation
- DropoutKernel: Element-wise multiplication with pre-computed mask
- SoftmaxKernel: Per-row softmax with numerical stability (optimized for small N like MNIST's 10 classes)
- CrossEntropyKernel: Per-sample cross-entropy loss. Takes predictions [M,N] and one-hot labels [M,N], outputs loss [M,1]
Autograd Support Kernels:
- TransposeKernel: Matrix transpose [M,N] → [N,M]
- SumReduceKernel: Sum along axis 0, [M,N] → [1,N]
- ElementwiseMulKernel: Hadamard product of two tensors
- ReLUBackwardKernel:
dX = dY * (X > 0) - SoftmaxBackwardKernel: Jacobian-vector product for softmax
- SoftmaxCEBackwardKernel: Combined softmax + cross-entropy backward:
dLogits = probs - labels
Optimizer Support Kernels:
- ScalarMulKernel:
output = input * scalar(for learning rate scaling) - InplaceAddKernel:
target += source(for parameter updates) - SumAllKernel: Reduces all elements to scalar [1,1] (for loss reduction)
Shaders use 16×16 workgroup size (except Softmax/CrossEntropy which use 256×1). Dispatch: (ceil(N/16), ceil(M/16), 1).
MatMul uses tiled algorithm with workgroup-local shared memory and barrier synchronization for coalesced memory access.
- Create class extending
Kernelinsrc/tensor/kernel/ - Define WGSL shader as static string
- Add shape validation, params buffer, bind group creation
- Register in
KernelRegistry
All layers implement the Layer interface (src/layer/Layer.ts):
interface Layer {
forward(input: Tensor): Tensor;
backward(input: Tensor): void;
parameters(): Tensor[];
}Available Layers:
-
Linear: Fully connected layer
output = input * W + b- Weight layout:
[inputFeatures, outputFeatures](row-major) - Input:
[batch, inputFeatures], Output:[batch, outputFeatures]
- Weight layout:
-
ReLU: ReLU activation as a layer wrapper
-
Dropout: Regularization layer with inverted dropout scaling
- Mask generated on CPU, applied on GPU
-
Sequential: Container for stacking layers
const model = new Sequential( new Linear(tm, kr, {...}), new ReLU(tm, kr, "relu"), new Dropout(tm, kr, "dropout", 0.5), ); const out = model.forward(input);
src/math/Utils.ts provides:
- heNormal: He normal initialization
N(0, sqrt(2/fanIn)) - heUniform: He uniform initialization
U(-limit, limit)wherelimit = sqrt(6/fanIn)
Automatic differentiation via computation graph tracking.
Tensor Autograd Fields (src/tensor/Tensor.ts):
gradient?: Tensor; // Accumulated gradient
gradFn?: GradientFunction; // Backward function for this node
parents?: Tensor[]; // Input tensors that created this tensor
requiresGradient: boolean; // Whether to track gradientsGradientFunction Interface (src/autograd/GradientFunction.ts):
interface GradientFunction {
name: string;
savedTensors: Tensor[]; // Tensors saved for backward
backward(gradOutput: Tensor): Tensor[]; // Compute gradients w.r.t. inputs
}Backpropagation (src/autograd/BackwardPass.ts):
- Build reverse topological order starting from loss
- Initialize loss gradient to ones
- Traverse in reverse order, calling
gradFn.backward()on each node - Accumulate gradients in parent tensors via
inplaceAdd
Backward Functions (src/autograd/backward/):
- MatMulBackward:
dA = dC @ Bᵀ,dB = Aᵀ @ dC - BiasAddBackward:
dX = dY,dBias = sum(dY, axis=0) - ReLUBackward:
dX = dY * (X > 0) - DropoutBackward:
dX = dY * mask * scale - SoftmaxCrossEntropyBackward:
dLogits = probs - labels
Optimizer Interface (src/optimizer/Optimizer.ts):
type LRSchedule =
| { type: "constant" }
| { type: "step"; factor: number; everyNSteps: number }
| { type: "exponential"; decayRate: number }
| { type: "cosine"; minLr: number; maxSteps: number };
interface Optimizer {
step(batchSizeOverride?: number): void; // Apply gradients to parameters
zeroGrad(): void; // Reset all gradients to zero
getLearningRate(): number;
setLearningRate(lr: number): void;
setSchedule(schedule: LRSchedule): void;
}SGD Optimizer (src/optimizer/SGD.ts):
const optimizer = new SGD(model.parameters(), learningRate, tm, kr, batchSize);
optimizer.setSchedule({ type: "cosine", minLr: 0.001, maxSteps: totalSteps });
optimizer.zeroGrad(); // Before forward pass
// ... forward, backward ...
optimizer.step(currentBatchSize); // Update: param = param - lr * grad / batchSizeFeatures:
- Learning rate scheduling (constant, step decay, exponential decay, cosine annealing)
- Batch size normalization of gradients
MNIST Class (src/MNIST/MNIST.ts):
class MNIST {
readonly model: Sequential;
constructor(tm: TensorManager, kernelRegistry: KernelRegistry, initializer = heUniform);
async readSnapshot(): Promise<void>; // Load pre-trained weights
async restart(): Promise<void>; // Reinitialize weights
}The MNIST model is a 2-layer MLP: 784 → 128 (ReLU) → 10
Trainer (src/Trainer.ts):
- Manages training loop with
requestAnimationFramefor non-blocking UI - Supports epoch-based training with configurable batch size
- Handles LR scheduling via cosine annealing
- State machine:
idle → training → finished(with cancellation support)
Tester (src/Tester.ts):
- Evaluates model accuracy on test set
- Non-blocking batch processing via
requestAnimationFrame - Computes accuracy by comparing argmax of predictions vs labels
// Using Trainer class
const trainer = new Trainer(tm, kr, mnist, datasource, onFinished, onUpdate);
await trainer.initialize();
trainer.startTraining();
// Or manual loop
optimizer.zeroGrad();
const logits = model.forward(input, true);
const loss = kr.crossEntropy.run(logits, labelsOneHot);
computeBackwardPass(tm, kr, loss); // Backpropagation
optimizer.step(batchSize); // Update parameters