A compact machine learning runtime for developers who want to understand, control, and optimize the full stack.
Native C core, modern Python API, no runtime dependencies, no bloat.
Documentation »
Qwen3 Inference Example
·
Autoencoder Training Example
·
GPT-2 Inference Example
Magnetron is a machine learning runtime built from scratch in C, with a small modern Python interface for usability.
It implements its own tensor system, operator set, autograd engine, and execution model - without relying on large external frameworks.
The goal is simple:
Keep the stack small enough to understand and be hackable, but powerful enough to run real models.
This makes Magnetron useful in two situations:
- when you want full control over execution and memory
- when you want a clean base for experimentation or new ideas
Magnetron is not trying to compete with PyTorch on ecosystem or feature count.
Instead, it optimizes for a different axis:
| Magnetron | PyTorch |
|---|---|
| Small, inspectable core | Large, layered system |
| Explicit execution | Implicit / abstracted |
| Minimal dependencies | Heavy runtime |
| Easy to modify kernels | Harder to reason about backend |
| Good for research & systems work | Good for production & scale |
If you want to:
- understand how your model actually runs
- experiment with kernels, memory layouts, or execution
- port ML workloads to unusual hardware
Magnetron gives you a much shorter path.
Magnetron is built as a single, cohesive runtime, not a collection of loosely coupled libraries.
-
Tensor system
Owns dtype, shape, strides, and memory – supports a full view system with a view solver, enabling complex slicing, reshaping, and broadcasting semantics similar to PyTorch while remaining explicit and predictable. -
Execution model
Eager execution with a dynamic autograd graph (reverse-mode), constructed per forward pass and traversed during backward. -
Operator backend
Central dispatch layer mapping high-level operations to architecture-specific kernel implementations. -
CPU backend
Multi-dispatch design with compile-time optimized kernels for a wide range of microarchitectures (Intel, AMD Zen1–Zen5, ARM).
At runtime, CPUID-based detection selects the most optimal kernel path automatically.
Supports multiple SIMD ISAs and extensions, including SSE (1–4), AVX, AVX2, FMA, AVX-512, AVX-512-BF16, AVX-512-FP16, F16C and ARM NEON, combined with multithreaded execution. -
CUDA backend (in progress)
Kernel layer is implemented - Memory management, execution pipeline, and integration are actively being completed. -
Serialization
Native.magformat designed for zero-copy, memory-mapped loading, enabling fast startup and efficient large model handling.
Conversion tools are provided to import weights from external formats. -
Backend extensibility
The architecture is intentionally clean and modular, making it straightforward to introduce new backends or target additional hardware platforms.
The system is intentionally kept tight and explicit, so each layer is understandable, controllable, and replaceable without hidden complexity.
-
Practical, not just educational
Capable of running modern LLM inference (e.g. Qwen3 in BF16), not just toy models. -
Small, controllable ML runtime
Designed to stay inspectable end-to-end — no hidden execution layers or opaque backends. -
True ownership of execution
You can reason about memory layout, kernel dispatch, and graph behavior without abstraction barriers. -
Hardware-aware by design
Not a generic backend wrapper — kernels and execution are written with specific ISAs and microarchitectures in mind. -
Zero-copy model loading
Memory-mapped.magformat enables fast startup and efficient handling of large models. -
Built for experimentation
Easy to modify operators, add kernels, or prototype new execution strategies. -
Minimal runtime surface
Native extension with no required Python dependencies — easy to deploy and embed.
End-to-end demos live under examples/.
| Path | Description |
|---|---|
| examples/qwen3/ | Qwen3 transformer inference in bfloat16 with tokenizer integration, .mag weights, CLI chat, and HTTP/streaming API. |
| examples/gpt2/ | GPT-2 causal language model inference with KV cache, token streaming, and configurable generation. |
| examples/ae/ | Convolutional autoencoder with training loop and reconstruction visualization. |
| examples/linear_regression/ | Simple 1D regression with SGD and loss tracking. |
| examples/xor/ | Minimal MLP demonstrating autograd and optimization. |
Magnetron provides a compact but expressive operator set covering:
- elementwise operations (add, mul, div, ...)
- reductions (sum, mean, ...)
- tensor transformations (view, reshape, permute, ...)
- neural building blocks (matmul, softmax, layernorm, ...)
- type casting and memory views
A full reference of operators, data types, and semantics is available here:
Magnetron is available on PyPI.
Make sure you are inside a Python virtual environment.
pip install magnetronor with uv:
uv pip install magnetronClone the repository and install locally:
git clone --recursive https://github.com/MarioSieg/magnetron
cd magnetron
uv pip install . -vFor C/C++ development, open the project root (containing CMakeLists.txt) in an IDE such as CLion.
from magnetron import Tensor, nn, optim
x = Tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
y = Tensor([[0.0], [1.0], [1.0], [0.0]])
model = nn.Sequential(
nn.Linear(2, 2),
nn.Tanh(),
nn.Linear(2, 1),
nn.Tanh(),
)
optimizer = optim.SGD(model.parameters(), lr=1e-1)
criterion = nn.MSELoss()
for epoch in range(2000):
y_hat = model(x)
loss = criterion(y_hat, y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
if epoch % 100 == 0:
print(f"Epoch {epoch:4d} | Loss {loss.item():.6f}")
y_hat = model(x)
for i in range(x.shape[0]):
print(f"Expected: {y[i].item():.1f}, Predicted: {y_hat[i].item():.4f}")-
🚧 CUDA backend
Finish memory model, execution pipeline, and stabilize for production use. -
🚧 Multi-GPU execution
Introduce scalable execution across multiple devices. -
🚧 New CPU architectures
Support for LoongArch and RISC-V. -
🧪 JIT compilation
Custom SSA-based IR with register allocation and target-specific instruction emission.
Magnetron started in 2024 as a personal project to understand how machine learning frameworks work internally: tensor storage, operator dispatch, autograd, and inference execution. What began as a learning project gradually evolved into a full runtime with its own tensor engine, native snapshot format, SIMD-specialized CPU backend, and support for running modern models such as Qwen3 in BF16. Today, Magnetron is developed both as a practical inference/runtime system and as a research platform for experimenting with new backends, execution strategies, and low-level ML systems ideas.
(c) 2026 Mario Sieg - mario.sieg.64@gmail.com
Distributed under the Apache 2 License.
Developed in Berlin, Germany.