Borg - European Graphics Processing Unit

Foundational workflow for an open-source GPU

The Borg (Bring yer Own GRaphics) project—supported by NLnet—is establishing a fully transparent, end-to-end silicon implementation flow for open-source GPU hardware using a 100% libre EDA toolchain. Recognizing that full GPU development is highly complex, the initiative capitalizes on recent advances in low-cost chip manufacturing to make individual tape-outs feasible for small teams.

📖 Read the Borg GPU Book for detailed documentation.

Architecture

The design is a TinyQV RISC-V SoC with the Borg FP16 shader processor as a memory-mapped peripheral, targeting both iCE40 FPGAs (pico-ice) and ASIC (IHP SG13G2 via Tiny Tapeout).

Borg Shader Processor

A minimal programmable shading unit with:

FP16 Fused Multiply-Add (FMA) — IEEE-754 compliant HardFloat unit supporting ADD, MUL, FMA, FNEG, FSTEP, and FRCP operations
32 general-purpose FP16 registers (r0–r31, expanding to 64), MMIO-accessible from the CPU
32-word instruction memory for shader programs
Hardware FP16 reciprocal (RCP) — LUT + linear interpolation for perspective division
4-cycle pipeline with automatic halt-on-zero-instruction

Rendering Pipeline

The firmware implements a full triangle rendering pipeline:

Vertex Shader — 4×4 MVP matrix multiply with hardware perspective division, executed as a single shader pass on the Borg FPU
Screen-Space Translation — NDC to pixel coordinates with configurable framebuffer resolution (up to 64×64)
Rasterization — Hardware-iterator driven edge evaluation with native FP16 coordinate expansion and FSM auto-chaining
Fragment Shader — Unified pass (compiled via linear scan allocator) performing barycentric interpolation for RGB, Z, and UV simultaneously
Z-Buffer — Per-pixel depth testing with texture mapping from PSRAM
Framebuffer Output — Results written to PSRAM, read by host (RP2040) for display

SPIR-B Shader Format

Shaders are compiled from GLSL-like source to a compact binary format (SPIR-B) and loaded at runtime from PSRAM — no firmware reflash needed to change shaders.

SystemRDL & Hardware Command FIFO

The MMIO architecture is generated automatically via the Accellera SystemRDL standard using PeakRDL-chisel, emitting both the Chisel BorgGpuRegs layout and the C-headers directly.

It features an asynchronous 2-entry Command FIFO so the CPU can pack and queue asynchronous drawing packets while the GPU handles geometry and rasterization in the background.

TinyQV CPU

Based on Michael Bell's TinyQV, an RV32I RISC-V core with nibble-serial processing designed for Tiny Tapeout. The original Verilog was rewritten in Chisel and heavily modified — including expanded register file support (RV32E → RV32I), integrated Borg peripheral bus, and adapted pipeline for QSPI flash/PSRAM and UART.

Prerequisites

Building and Testing

Run all tests (Chisel + RTL cocotb)

make test-all

Individual test targets

make test-chisel-borg          # Borg FPU unit tests (Chisel)
make test-chisel-core          # TinyQV CPU tests (Chisel)
make test-cocotb-soc-core-rtl  # CPU SoC integration tests (cocotb)
make test-cocotb-soc-borg-rtl  # Borg peripheral tests (cocotb)

Cycle-Accurate C++ Simulation & Interactive Pygame UI

Fast C++ simulators for RTL validation, capable of rendering frames locally without an FPGA, featuring a real-time cycle-accurate interactive view.

python simulation/verilator/viewer.py # Bind the Pygame UI to cycle-accurate rendering

FPGA (pico-ice)

Prerequisites: pico-ice FPGA + Raspberry Pi debug probe.

cd fpga
make burn           # Build bitstream and upload to FPGA
make triangle       # Run triangle rendering (vertex shader on FPGA, display on RP2040)

ASIC (Tiny Tapeout)

make gds            # Full RTL-to-GDS flow via LibreLane/OpenROAD

Milestones

Task	Status
FPU on software simulator (Chisel + cocotb)	✅ Done
FPU integrated into TinyQV SoC	✅ Done
Vertex shader on FPGA	✅ Done
Triangle rasterization + fragment shading	✅ Done
SPIR-B runtime shader loading	✅ Done
Per-vertex color interpolation	✅ Done
Dynamic framebuffer resolution	✅ Done
Tiny Tapeout TTIHP26a submission	✅ Submitted
32-bit RISC-V instructions & 32-entry register file	✅ Done
Hardware perspective projection (4×4 MVP shader)	✅ Done
Hardware FP16 reciprocal (FRCP)	✅ Done
Back-face culling & depth-correct vkcube	✅ Done
Hardware fragment interpolation	✅ Done
SystemRDL Automated Memory Mapping	✅ Done
Hardware Command FIFO (2-entry asynchronous submission)	✅ Done
Cycle-accurate C++ simulation (Arcilator & Verilator)	✅ Done
Interactive UI Viewer (zero-copy Pygame)	✅ Done
Test manufactured chip	⏳ Pending
Vulkan driver	📋 Planned

Software Bill of Materials

Component	Description	License
Chisel	Hardware construction language (Scala → Verilog)	Apache-2.0
TinyQV	RV32I RISC-V CPU core (rewritten in Chisel)	Apache-2.0
Berkeley HardFloat	IEEE-754 floating-point units (FMA)	BSD-3-Clause
LibreLane	RTL-to-GDS ASIC flow orchestrator	Apache-2.0
Yosys	RTL synthesis	ISC
OpenROAD	Place and route	BSD-3-Clause
Magic	Layout tool, DRC, GDS export	MIT
KLayout	GDS viewer and DRC	GPL-2.0
IHP SG13G2 PDK	IHP 130nm process design kit	Apache-2.0
cocotb	Python-based RTL simulation and testing	BSD-3-Clause
Icarus Verilog	Verilog simulation (cocotb backend)	GPL-2.0
Verilator	Verilog linting and simulation	LGPL-3.0
nextpnr	FPGA place and route (iCE40)	ISC
IceStorm	iCE40 FPGA bitstream tools	ISC
Netgen	LVS (Layout vs. Schematic)	MIT
GCC	RISC-V cross-compiler (`riscv32-embedded`)	GPL-3.0
Mill	Scala build tool	MIT
Tiny Tapeout Tools	Build and submission orchestrator	Apache-2.0
Nix	Reproducible development environment	LGPL-2.1
CIRCT/firtool	Chisel → Verilog compiler (FIRRTL)	Apache-2.0 (LLVM)
Arcilator	Cycle-accurate FIRRTL C++ simulator	Apache-2.0 (LLVM)
OpenJDK	Java runtime for Chisel/Mill	GPL-2.0 + CE
SystemRDL	Register logic definition standard	Accellera
PeakRDL	Toolchain for parsing and exporting SystemRDL	GPL-3.0
nanobind	Zero-overhead C++ to Python bindings	BSD-3-Clause
Pygame (SDL2)	Hardware-accelerated UI windowing subsystem	LGPL-2.1

Name		Name	Last commit message	Last commit date
Latest commit History 568 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.vscode		.vscode
LICENSES		LICENSES
PeakRDL-chisel @ 5c172cb		PeakRDL-chisel @ 5c172cb
data		data
docs		docs
fpga		fpga
hardware		hardware
scripts		scripts
simulation		simulation
software		software
src		src
test		test
tt @ b7acfdc		tt @ b7acfdc
.antigravityrules		.antigravityrules
.envrc		.envrc
.gitignore		.gitignore
.gitmodules		.gitmodules
.plan		.plan
.scalafmt.conf		.scalafmt.conf
Makefile		Makefile
README.md		README.md
build.mill		build.mill
flake.lock		flake.lock
flake.nix		flake.nix
info.template.yaml		info.template.yaml
lint.vlt		lint.vlt
old_main.cpp		old_main.cpp
sim_trace.txt		sim_trace.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Borg - European Graphics Processing Unit

Foundational workflow for an open-source GPU

Architecture

Borg Shader Processor

Rendering Pipeline

SPIR-B Shader Format

SystemRDL & Hardware Command FIFO

TinyQV CPU

Prerequisites

Building and Testing

Run all tests (Chisel + RTL cocotb)

Individual test targets

Cycle-Accurate C++ Simulation & Interactive Pygame UI

FPGA (pico-ice)

ASIC (Tiny Tapeout)

Milestones

Software Bill of Materials

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Borg - European Graphics Processing Unit

Foundational workflow for an open-source GPU

Architecture

Borg Shader Processor

Rendering Pipeline

SPIR-B Shader Format

SystemRDL & Hardware Command FIFO

TinyQV CPU

Prerequisites

Building and Testing

Run all tests (Chisel + RTL cocotb)

Individual test targets

Cycle-Accurate C++ Simulation & Interactive Pygame UI

FPGA (pico-ice)

ASIC (Tiny Tapeout)

Milestones

Software Bill of Materials

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages