Contributing

This guide provides information for developers working on the chess engine codebase, including development setup, architecture, code standards, and optimization workflows.

Development Setup

Building and Running

# Build release version (required for performance testing)
cargo build --release

# Install the binary
cargo install --path .

# Run the engine
chess --help

IMPORTANT: Always use the installed chess binary (available via which chess) rather than ./target/release/chess. The binary is installed at ~/.cargo/bin/chess and should be in your PATH.

Testing and Quality

# Run all tests
cargo test

# Run benchmarks (requires release profile)
cargo bench

# Format code (required before commit)
cargo fmt

# Lint code (required before commit)
cargo clippy -- -D warnings

# Profile with flamegraph
sudo cargo flamegraph --bench pvp_benchmark

Common Subcommands

# Play against the engine
chess play --depth 4 --color white

# Calculate best move from FEN position
chess calculate-best-move --fen "1Q6/8/8/8/8/k1K5/8/8 w - - 0 1"

# Count positions (performance test)
chess count-positions --depth 6 --strategy alpha-beta

# Test against Stockfish
chess determine-stockfish-elo --depth 6 --starting-elo 2000

Architecture

Core Design Philosophy

The engine separates concerns into three layers:

Generic alpha-beta search (alpha_beta_searcher/) - Game-agnostic algorithm using traits
Chess-specific implementations (chess_search/) - Chess logic implementing the search traits
Board representation & move generation (board/, move_generator/) - Low-level chess mechanics

Key Architectural Patterns

Trait-Based Search: The alpha-beta algorithm is implemented generically using traits (MoveGenerator, Evaluator, MoveOrderer). Chess-specific implementations live in chess_search/implementation.rs. This separation enables:

Independent testing of search algorithm logic
Clean separation between algorithm and domain logic
Type-safe abstraction boundaries

Bitboard Representation: Board state uses 64-bit integers (Bitboard) for efficient bitwise operations. Individual squares are represented as newtype-wrapped u8 indices (Square). This enables CPU-level optimizations for move generation and attack calculations.

Move Type Hierarchy: Chess moves are represented as an enum (ChessMove) with variants for different move types:

StandardChessMove - Regular piece moves and captures
CastleChessMove - King and rook castling
EnPassantChessMove - Special pawn capture
PawnPromotionChessMove - Pawn reaching back rank

Each variant implements the ChessMoveEffect trait for applying/unapplying moves to board state.

State Management: Board maintains a StateStack for efficient undo operations. Move application pushes state, move reversal pops state. This enables the search algorithm to explore game trees without expensive cloning.

Zobrist Hashing: Hash tables generated at compile time by precompile/ build script enable incremental hashing for caching move generation and transposition table lookups.

Move Ordering: The ChessMoveOrderer prioritizes moves to improve alpha-beta pruning efficiency:

PV move from transposition table
Killer moves (from sibling beta cutoffs at the same ply)
Captures (MVV-LVA ordering)
Promotions
Quiet moves (history heuristic, then piece type)

Interior nodes use incremental selection (pick-best) instead of a full sort — the PV and killers are placed at front, then remaining moves are selected on demand via MoveOrderer::pick_next(). Root and quiescence nodes use full sort. Better ordering = more pruning = faster search.

Module Organization

common/ - Shared types between engine and precompiler

bitboard/bitboard.rs - 64-bit integer set operations
bitboard/square.rs - Newtype-wrapped u8 for board squares

precompile/ - Build-time code generation (runs at compile time)

zobrist/ - Generates Zobrist hash tables
magic/ - Computes magic bitboard numbers for sliding piece attacks
book/ - Generates opening book from PGN data

src/ - Main engine implementation

prelude.rs - Common type re-exports (Board, Color, Piece, ChessMove, Bitboard, Square)
alpha_beta_searcher/ - Generic search algorithm with traits
- search.rs - Core alpha-beta with iterative deepening, aspiration windows, quiescence, NMP, RFP, futility pruning, LMR, check extensions
- traits.rs - Game-agnostic trait definitions (MoveGenerator, Evaluator, MoveOrderer)
- transposition_table.rs - DashMap-backed cache with depth-preferred replacement
- killer_moves.rs - Thread-local killer move storage for parallel search
- tests.rs - Comprehensive search algorithm tests
chess_search/ - Chess-specific search implementations
- implementation.rs - Implements MoveGenerator, Evaluator traits for chess
- move_orderer.rs - Move ordering heuristics (MVV-LVA, killer moves, PV)
- history_table.rs - History heuristic for move ordering
- tests.rs - Chess search tests
board/ - Board state representation
- board.rs - Main Board struct with piece placement and game state
- piece_set.rs - Per-color piece bitboards
- state_stack.rs - Undo stack for move reversal
- color.rs, piece.rs - Color and piece type definitions
- castle_rights.rs, halfmove_clock.rs, fullmove_number.rs - Game state newtypes
- display.rs, error.rs, move_info.rs, position_info.rs - Board utilities
- tests.rs - Board state tests
chess_move/ - Move types and application
- chess_move.rs - ChessMove enum with all move variants
- chess_move_effect.rs - Trait for applying/unapplying moves to board
- standard.rs, castle.rs, en_passant.rs, pawn_promotion.rs, capture.rs - Move implementations
- algebraic_notation.rs - Display moves in standard notation
- traits.rs - Move-related traits
move_generator/ - Legal move generation
- generator.rs - Main move generation logic with caching
- targets.rs - Calculate attack/movement targets for pieces
- magic_table.rs - Magic bitboards for sliding pieces (rooks, bishops, queens)
evaluate/ - Position scoring
- evaluation.rs - Board evaluation heuristic (material + position)
- evaluation_tables.rs - Piece-square tables for positional scoring
game/ - Game loop and I/O
- loop.rs - Main game loop using InputSource and GameRenderer traits
- engine.rs - Computer player using alpha-beta search
- input_source.rs, renderer.rs - Trait abstractions for I/O
- display.rs, action.rs, mode.rs, util.rs - Game utilities
- stockfish_interface.rs, stockfish_elo.rs - Stockfish integration for testing
- position_counter.rs, alpha_beta_benchmark.rs - Performance testing utilities
book/ - Opening book
- book.rs - Opening book lookup for move suggestions
input_handler/ - Input parsing
- fen.rs, fen_serialize.rs - FEN notation parsing and serialization
- input.rs - User input handling
cli/ - Command-line interface
- args.rs - Argument parsing
- commands/ - Subcommand implementations (play, uci, benchmark, etc.)
uci/ - UCI protocol implementation
- protocol.rs - UCI protocol state machine
- command_parser.rs - UCI command parsing
- response_formatter.rs - UCI response formatting
tui/ - Terminal user interface
- app.rs - Main TUI application state and rendering
- board_widget.rs - Chess board widget for ratatui
- theme.rs - Color theme management
diagnostics/ - Performance diagnostics
- memory_profiler.rs - Memory allocation tracking and profiling

Important Implementation Details

chess_position! Macro: Instantiates board state from ASCII art. Used extensively in tests:

let board = chess_position! {
    rnbqkbnr
    pppppppp
    ........
    ........
    ........
    ........
    PPPPPPPP
    RNBQKBNR
};

Build Script: Cargo.toml specifies build = "precompile/src/main.rs". This runs the precompiler before compilation to generate Zobrist tables and magic numbers.

Release Profile Optimizations: Critical for performance testing (see Cargo.toml):

lto = true - Link-time optimization
codegen-units = 1 - Single codegen unit for better optimization
opt-level = 3 - Maximum optimization

Code Standards

Module Structure

The codebase follows idiomatic Rust module organization practices:

mod.rs files: Only module declarations and re-exports, no implementation
Implementation files: Dedicated .rs files (e.g., board/board.rs, not in mod.rs)
Module docstrings: Required //! comment at top of all module and implementation files

Import Organization

All imports follow a consistent ordering standard (enforced by the pre-commit hook):

Standard library (std::, core::)
External crates (common, rayon, smallvec, etc.)
Crate imports (crate::)
Relative imports (super::, self::)

Separate groups with blank lines. Alphabetically sort within groups.

Example:

use std::io;
use std::str::FromStr;

use common::bitboard::Square;
use rayon::prelude::*;

use crate::board::Board;
use crate::chess_move::ChessMove;

use super::helper::Helper;

Pre-Commit Hook

Located at .git/hooks/pre-commit. Runs automatically before commits:

cargo fmt -- --check - Verify formatting
cargo clippy -- -D warnings - Enforce zero warnings

If checks fail, commit is rejected. Fix issues before committing.

Code Quality Enforcement

rustfmt.toml - Configuration file documenting the import style standard and other formatting rules.
Pre-commit hook (.git/hooks/pre-commit) - Automatically runs cargo fmt -- --check and cargo clippy -- -D warnings before each commit to ensure code quality and consistency.

Profiling and Optimization

This section provides a systematic approach to profiling and optimizing codepaths in the chess engine. Use this as a template when optimizing any component.

Profiling Tools

CPU Profiling (Text-Based)

macOS:

chess count-positions --depth 6 2>&1 &
CHESS_PID=$!
sleep 1
sample $CHESS_PID 30 -file /tmp/profile.txt
wait $CHESS_PID
tail -n 100 /tmp/profile.txt  # See "Sort by top of stack" summary

Linux:

chess count-positions --depth 6 2>&1 &
CHESS_PID=$!
sleep 1
perf record -p $CHESS_PID sleep 30
perf report

Analyze the "Sort by top of stack" section to identify hot functions.

Memory Profiler (Built-In)

The engine includes instrumentation for tracking allocations:

Automatically tracks board clones and MoveGenerator allocations
Output shown after count-positions command
Add custom counters using MemoryProfiler::record_*() methods in src/diagnostics/memory_profiler.rs

Visual Profiling

Flamegraph:

sudo cargo flamegraph --bench pvp_benchmark

Debug Logging for Lock Contention

The engine includes debug-level logging for monitoring lock contention in critical sections. These messages are useful for identifying synchronization bottlenecks:

Enable debug logging:

RUST_LOG=debug chess play --depth 4

Key debug messages:

Slow killer get lock - Indicates when acquiring the killer moves lock takes >100µs
Slow killer store lock - Indicates when acquiring the killer moves lock for storage takes >100µs

These messages help identify when thread contention on the killer_moves mutex is impacting performance. High frequency of these messages suggests the parallel search strategy may need adjustment or the killer move storage mechanism should be optimized.

Note: These messages are logged at debug level by default and won't appear in normal operation. Use RUST_LOG=debug to enable them during profiling.

Latency Instrumentation (Built-In)

The engine includes compile-time instrumentation to measure function-level latency. This shows exactly where time is spent across all hot codepaths.

Build with instrumentation:

cargo build --release --features instrumentation

Run benchmark:

./target/release/chess benchmark-alpha-beta --depth 4

Output shows:

Function call counts (e.g., generate_moves: 8.3M calls)
Total time per function (e.g., quiescence_search: 4,868s)
Average time per call (e.g., apply: 1.22µs average)

Example output:

Function                                        Calls   Total (ms)     Avg (µs)
--------------------------------------------------------------------------------
quiescence_search                              536220   4867809.61      9078.01
alpha_beta_minimax                             216718   4803848.15     22166.36
generate_moves                                8317003   1275467.43       153.36
remove_invalid_moves                          8317003   1216735.59       146.29
generate_attack_targets                     336648619    454444.55         1.35
apply                                       321011382    392248.92         1.22

Overhead: Instrumentation adds ~13x overhead. Use this for understanding bottlenecks, not measuring absolute performance. Disable instrumentation for normal builds.

Optimization Workflow

1. Establish Baseline

Measure performance before making changes:

chess count-positions --depth 5 | tail -n 1

Record key metrics: positions/second, duration, memory profiler stats.

Quick feedback loops are essential: Use shallow depths (depth 4-5) during development for fast iteration. Deep profiling (depth 6+) should be reserved for final validation, as it takes significantly longer. The count-positions command provides immediate feedback on performance changes, allowing you to iterate quickly and validate optimizations before investing time in comprehensive profiling.

2. Profile to Identify Bottlenecks

Use CPU profiling to find:

Thread synchronization overhead (e.g., __psynch_cvwait on macOS)
Excessive allocations (board clones, object creations)
Algorithmic bottlenecks (not just hot functions)

3. Focus on Algorithmic Improvements

Prioritize optimizations that:

Reduce operation counts: Eliminate redundant allocations, avoid unnecessary cloning
Improve parallelization strategy: Multi-depth parallelization, conditional cloning
Share resources: Pass references instead of creating new instances

4. Measure After Each Change

Critical: Always verify correctness before measuring performance:

cargo test  # Must pass before measuring
chess count-positions --depth 5 | tail -n 1

Compare results to baseline and verify position counts match exactly.

Maintain quick feedback cycles: After each optimization, run tests and a quick performance check. This allows you to:

Catch regressions immediately
Validate that optimizations actually improve performance
Iterate rapidly without waiting for long-running benchmarks
Build confidence before investing in deeper profiling

Only run comprehensive profiling (depth 6+, full CPU profiling) after you've validated improvements with quick checks.

Key Lessons

High sample counts ≠ slow functions: Functions called frequently may show high counts but be fast
Compiler optimizations: Modern compilers with LTO already optimize many micro-operations
Real bottlenecks: Often in resource allocation patterns, parallelization strategy, or algorithmic complexity
Instrumentation is essential: Add counters to track allocations, clones, and operation counts
System variability: ±10% variation is normal; measure multiple times for small improvements

Performance Considerations

Always test performance changes with release builds (cargo build --release)
Use benchmarks (cargo bench) to measure impact of optimizations
The pvp_benchmark simulates engine-vs-engine gameplay for realistic profiling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing

Development Setup

Building and Running

Testing and Quality

Common Subcommands

Architecture

Core Design Philosophy

Key Architectural Patterns

Module Organization

Important Implementation Details

Code Standards

Module Structure

Import Organization

Pre-Commit Hook

Code Quality Enforcement

Profiling and Optimization

Profiling Tools

CPU Profiling (Text-Based)

Memory Profiler (Built-In)

Visual Profiling

Debug Logging for Lock Contention

Latency Instrumentation (Built-In)

Optimization Workflow

1. Establish Baseline

2. Profile to Identify Bottlenecks

3. Focus on Algorithmic Improvements

4. Measure After Each Change

Key Lessons

Performance Considerations

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing

Development Setup

Building and Running

Testing and Quality

Common Subcommands

Architecture

Core Design Philosophy

Key Architectural Patterns

Module Organization

Important Implementation Details

Code Standards

Module Structure

Import Organization

Pre-Commit Hook

Code Quality Enforcement

Profiling and Optimization

Profiling Tools

CPU Profiling (Text-Based)

Memory Profiler (Built-In)

Visual Profiling

Debug Logging for Lock Contention

Latency Instrumentation (Built-In)

Optimization Workflow

1. Establish Baseline

2. Profile to Identify Bottlenecks

3. Focus on Algorithmic Improvements

4. Measure After Each Change

Key Lessons

Performance Considerations