📚 Educational Repository — Learn branchless programming techniques through practical, heavily-commented examples across multiple languages.
When you write an if statement, the compiled code creates a branch — a fork where the CPU must choose which path to take. Modern CPUs try to predict which branch you'll take before they actually know, executing instructions speculatively. When the prediction is wrong, all that speculative work is thrown away (a pipeline flush), costing 10–20+ clock cycles.
Branchless programming replaces if/else with arithmetic and bit tricks that always execute the same instructions, regardless of the input. The CPU never has to guess, so there's never a misprediction penalty.
// ❌ Branching — CPU must predict which path
int min_branching(int a, int b) {
if (a < b) return a;
else return b;
}
// ✅ Branchless — same instructions every time
int min_branchless(int a, int b) {
int diff = a - b;
int mask = diff >> 31; // 0 if a >= b, -1 if a < b
return b + (diff & mask); // b + 0 = b, or b + (a-b) = a
}Both functions return the same result. The branchless version avoids the if by using the sign bit of (a - b) as a mask to select the right value.
| ✅ Good use cases | ❌ Bad use cases |
|---|---|
| Hot inner loops (millions of iterations) | Code that runs once |
| SIMD / GPU shader code | Business logic |
| Cryptography (constant-time prevents timing attacks) | Readability-critical code |
| Sorting networks for small arrays | General-purpose sorting |
Rule of thumb: Write clear code first. Profile. Only go branchless if the profiler says branching is the bottleneck.
| Technique | What It Does | Where |
|---|---|---|
| Sign bit extraction | x >> 31 gives 0 or -1, telling you if a number is negative |
All basic files |
| Bit masking | value & mask where mask is all-0s or all-1s to select/zero a value |
program.c, program.js |
| Multiply by 0 or 1 | int(condition) gives 0/1, multiply to keep or discard a value |
sorting.py, calculator.py |
| Function table lookup | OPERATIONS[index](args) replaces if/elif chains |
calculator.py |
| 2D transition tables | TABLE[state][event] replaces switch/case |
state_machine.py |
| Arithmetic clamping | Keep values in range without if (val > max) checks |
image_processing.py |
| File | Language | What It Teaches |
|---|---|---|
program.c |
C | Sign detection, branchless min (bitmask & multiply approaches) |
program.js |
JavaScript | Same techniques — note JS bitwise ops use 32-bit signed ints |
program.py |
Python | Same techniques — includes to_signed32() helper for Python's arbitrary-precision ints |
program.sh |
Bash | Same techniques — uses 64-bit shifts (>> 63) for Bash's native integer size |
program.sql |
MariaDB SQL | Boolean expressions (a < b) return 0 or 1 — natural branchless building block |
advanced/calculator.py |
Python | Function table lookup, branchless safe division |
advanced/sorting.py |
Python | Sorting networks, branchless swap/clamp/abs |
advanced/image_processing.py |
Python | Pixel ops: threshold, blend, contrast, brightness |
advanced/state_machine.py |
Python | Table-driven game character state machine |
# C
gcc -o program program.c && ./program
# JavaScript (Node.js)
node program.js
# Python
python3 program.py
# Bash
bash program.sh
# SQL (requires MariaDB/MySQL)
mysql -u root < program.sql
# Advanced Python examples
python3 advanced/calculator.py
python3 advanced/sorting.py
python3 advanced/image_processing.py
python3 advanced/state_machine.pyModern CPUs have instruction pipelines — they start executing the next instruction before the current one finishes (like an assembly line). When a branch (if) appears, the CPU doesn't know which instruction comes next until the condition is evaluated. Rather than stalling, it guesses (branch prediction) and starts executing the predicted path.
- Correct guess: No penalty, full speed
- Wrong guess: Pipeline flush → 10–20+ wasted cycles
For predictable branches (e.g., if (i < array.length) in a loop), the CPU predicts correctly ~99% of the time. But for unpredictable branches (e.g., if (data[i] > threshold) with random data), mispredictions can halve your throughput.
- No misprediction penalties — arithmetic always executes the same way
- Better pipelining — instructions flow without stalls
- SIMD-friendly — enables vectorization across multiple data points simultaneously
- Constant-time execution — critical in cryptography to prevent timing side-channel attacks
- Harder to read — bit manipulation is not intuitive (that's why this repo exists!)
- May compute unnecessary values — both "branches" are always evaluated
- Not always faster — for highly predictable branches, the CPU already does great
- Profile before optimizing — don't make code harder to maintain for no measurable gain