Skip to content

maxron84/Branchless-Programming-Example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Branchless Programming Examples

📚 Educational Repository — Learn branchless programming techniques through practical, heavily-commented examples across multiple languages.


What Is Branchless Programming?

When you write an if statement, the compiled code creates a branch — a fork where the CPU must choose which path to take. Modern CPUs try to predict which branch you'll take before they actually know, executing instructions speculatively. When the prediction is wrong, all that speculative work is thrown away (a pipeline flush), costing 10–20+ clock cycles.

Branchless programming replaces if/else with arithmetic and bit tricks that always execute the same instructions, regardless of the input. The CPU never has to guess, so there's never a misprediction penalty.

A Simple Example

// ❌ Branching — CPU must predict which path
int min_branching(int a, int b) {
    if (a < b) return a;
    else return b;
}

// ✅ Branchless — same instructions every time
int min_branchless(int a, int b) {
    int diff = a - b;
    int mask = diff >> 31;       // 0 if a >= b, -1 if a < b
    return b + (diff & mask);    // b + 0 = b, or b + (a-b) = a
}

Both functions return the same result. The branchless version avoids the if by using the sign bit of (a - b) as a mask to select the right value.

When Should You Use This?

✅ Good use cases ❌ Bad use cases
Hot inner loops (millions of iterations) Code that runs once
SIMD / GPU shader code Business logic
Cryptography (constant-time prevents timing attacks) Readability-critical code
Sorting networks for small arrays General-purpose sorting

Rule of thumb: Write clear code first. Profile. Only go branchless if the profiler says branching is the bottleneck.


Techniques Demonstrated

Technique What It Does Where
Sign bit extraction x >> 31 gives 0 or -1, telling you if a number is negative All basic files
Bit masking value & mask where mask is all-0s or all-1s to select/zero a value program.c, program.js
Multiply by 0 or 1 int(condition) gives 0/1, multiply to keep or discard a value sorting.py, calculator.py
Function table lookup OPERATIONS[index](args) replaces if/elif chains calculator.py
2D transition tables TABLE[state][event] replaces switch/case state_machine.py
Arithmetic clamping Keep values in range without if (val > max) checks image_processing.py

Files

File Language What It Teaches
program.c C Sign detection, branchless min (bitmask & multiply approaches)
program.js JavaScript Same techniques — note JS bitwise ops use 32-bit signed ints
program.py Python Same techniques — includes to_signed32() helper for Python's arbitrary-precision ints
program.sh Bash Same techniques — uses 64-bit shifts (>> 63) for Bash's native integer size
program.sql MariaDB SQL Boolean expressions (a < b) return 0 or 1 — natural branchless building block
advanced/calculator.py Python Function table lookup, branchless safe division
advanced/sorting.py Python Sorting networks, branchless swap/clamp/abs
advanced/image_processing.py Python Pixel ops: threshold, blend, contrast, brightness
advanced/state_machine.py Python Table-driven game character state machine

Running the Examples

# C
gcc -o program program.c && ./program

# JavaScript (Node.js)
node program.js

# Python
python3 program.py

# Bash
bash program.sh

# SQL (requires MariaDB/MySQL)
mysql -u root < program.sql

# Advanced Python examples
python3 advanced/calculator.py
python3 advanced/sorting.py
python3 advanced/image_processing.py
python3 advanced/state_machine.py

Why Branchless? (Deep Dive)

Branch Prediction & Pipeline Stalls

Modern CPUs have instruction pipelines — they start executing the next instruction before the current one finishes (like an assembly line). When a branch (if) appears, the CPU doesn't know which instruction comes next until the condition is evaluated. Rather than stalling, it guesses (branch prediction) and starts executing the predicted path.

  • Correct guess: No penalty, full speed
  • Wrong guess: Pipeline flush → 10–20+ wasted cycles

For predictable branches (e.g., if (i < array.length) in a loop), the CPU predicts correctly ~99% of the time. But for unpredictable branches (e.g., if (data[i] > threshold) with random data), mispredictions can halve your throughput.

Benefits of Branchless Code

  • No misprediction penalties — arithmetic always executes the same way
  • Better pipelining — instructions flow without stalls
  • SIMD-friendly — enables vectorization across multiple data points simultaneously
  • Constant-time execution — critical in cryptography to prevent timing side-channel attacks

Trade-offs

  • Harder to read — bit manipulation is not intuitive (that's why this repo exists!)
  • May compute unnecessary values — both "branches" are always evaluated
  • Not always faster — for highly predictable branches, the CPU already does great
  • Profile before optimizing — don't make code harder to maintain for no measurable gain

About

Learn branchless programming techniques through practical examples across multiple languages.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors