This guide provides information for developers working on the chess engine codebase, including development setup, architecture, code standards, and optimization workflows.
# Build release version (required for performance testing)
cargo build --release
# Install the binary
cargo install --path .
# Run the engine
chess --helpIMPORTANT: Always use the installed chess binary (available via which chess) rather than ./target/release/chess. The binary is installed at ~/.cargo/bin/chess and should be in your PATH.
# Run all tests
cargo test
# Run benchmarks (requires release profile)
cargo bench
# Format code (required before commit)
cargo fmt
# Lint code (required before commit)
cargo clippy -- -D warnings
# Profile with flamegraph
sudo cargo flamegraph --bench pvp_benchmark# Play against the engine
chess play --depth 4 --color white
# Calculate best move from FEN position
chess calculate-best-move --fen "1Q6/8/8/8/8/k1K5/8/8 w - - 0 1"
# Count positions (performance test)
chess count-positions --depth 6 --strategy alpha-beta
# Test against Stockfish
chess determine-stockfish-elo --depth 6 --starting-elo 2000The engine separates concerns into three layers:
- Generic alpha-beta search (
alpha_beta_searcher/) - Game-agnostic algorithm using traits - Chess-specific implementations (
chess_search/) - Chess logic implementing the search traits - Board representation & move generation (
board/,move_generator/) - Low-level chess mechanics
Trait-Based Search: The alpha-beta algorithm is implemented generically using traits (MoveGenerator, Evaluator, MoveOrderer). Chess-specific implementations live in chess_search/implementation.rs. This separation enables:
- Independent testing of search algorithm logic
- Clean separation between algorithm and domain logic
- Type-safe abstraction boundaries
Bitboard Representation: Board state uses 64-bit integers (Bitboard) for efficient bitwise operations. Individual squares are represented as newtype-wrapped u8 indices (Square). This enables CPU-level optimizations for move generation and attack calculations.
Move Type Hierarchy: Chess moves are represented as an enum (ChessMove) with variants for different move types:
StandardChessMove- Regular piece moves and capturesCastleChessMove- King and rook castlingEnPassantChessMove- Special pawn capturePawnPromotionChessMove- Pawn reaching back rank
Each variant implements the ChessMoveEffect trait for applying/unapplying moves to board state.
State Management: Board maintains a StateStack for efficient undo operations. Move application pushes state, move reversal pops state. This enables the search algorithm to explore game trees without expensive cloning.
Zobrist Hashing: Hash tables generated at compile time by precompile/ build script enable incremental hashing for caching move generation and transposition table lookups.
Move Ordering: The ChessMoveOrderer prioritizes moves to improve alpha-beta pruning efficiency:
- PV move from transposition table
- Killer moves (from sibling beta cutoffs at the same ply)
- Captures (MVV-LVA ordering)
- Promotions
- Quiet moves (history heuristic, then piece type)
Interior nodes use incremental selection (pick-best) instead of a full sort — the PV and killers are placed at front, then remaining moves are selected on demand via MoveOrderer::pick_next(). Root and quiescence nodes use full sort. Better ordering = more pruning = faster search.
common/ - Shared types between engine and precompiler
bitboard/bitboard.rs- 64-bit integer set operationsbitboard/square.rs- Newtype-wrapped u8 for board squares
precompile/ - Build-time code generation (runs at compile time)
zobrist/- Generates Zobrist hash tablesmagic/- Computes magic bitboard numbers for sliding piece attacksbook/- Generates opening book from PGN data
src/ - Main engine implementation
prelude.rs- Common type re-exports (Board,Color,Piece,ChessMove,Bitboard,Square)alpha_beta_searcher/- Generic search algorithm with traitssearch.rs- Core alpha-beta with iterative deepening, aspiration windows, quiescence, NMP, RFP, futility pruning, LMR, check extensionstraits.rs- Game-agnostic trait definitions (MoveGenerator,Evaluator,MoveOrderer)transposition_table.rs- DashMap-backed cache with depth-preferred replacementkiller_moves.rs- Thread-local killer move storage for parallel searchtests.rs- Comprehensive search algorithm tests
chess_search/- Chess-specific search implementationsimplementation.rs- ImplementsMoveGenerator,Evaluatortraits for chessmove_orderer.rs- Move ordering heuristics (MVV-LVA, killer moves, PV)history_table.rs- History heuristic for move orderingtests.rs- Chess search tests
board/- Board state representationboard.rs- MainBoardstruct with piece placement and game statepiece_set.rs- Per-color piece bitboardsstate_stack.rs- Undo stack for move reversalcolor.rs,piece.rs- Color and piece type definitionscastle_rights.rs,halfmove_clock.rs,fullmove_number.rs- Game state newtypesdisplay.rs,error.rs,move_info.rs,position_info.rs- Board utilitiestests.rs- Board state tests
chess_move/- Move types and applicationchess_move.rs-ChessMoveenum with all move variantschess_move_effect.rs- Trait for applying/unapplying moves to boardstandard.rs,castle.rs,en_passant.rs,pawn_promotion.rs,capture.rs- Move implementationsalgebraic_notation.rs- Display moves in standard notationtraits.rs- Move-related traits
move_generator/- Legal move generationgenerator.rs- Main move generation logic with cachingtargets.rs- Calculate attack/movement targets for piecesmagic_table.rs- Magic bitboards for sliding pieces (rooks, bishops, queens)
evaluate/- Position scoringevaluation.rs- Board evaluation heuristic (material + position)evaluation_tables.rs- Piece-square tables for positional scoring
game/- Game loop and I/Oloop.rs- Main game loop usingInputSourceandGameRenderertraitsengine.rs- Computer player using alpha-beta searchinput_source.rs,renderer.rs- Trait abstractions for I/Odisplay.rs,action.rs,mode.rs,util.rs- Game utilitiesstockfish_interface.rs,stockfish_elo.rs- Stockfish integration for testingposition_counter.rs,alpha_beta_benchmark.rs- Performance testing utilities
book/- Opening bookbook.rs- Opening book lookup for move suggestions
input_handler/- Input parsingfen.rs,fen_serialize.rs- FEN notation parsing and serializationinput.rs- User input handling
cli/- Command-line interfaceargs.rs- Argument parsingcommands/- Subcommand implementations (play, uci, benchmark, etc.)
uci/- UCI protocol implementationprotocol.rs- UCI protocol state machinecommand_parser.rs- UCI command parsingresponse_formatter.rs- UCI response formatting
tui/- Terminal user interfaceapp.rs- Main TUI application state and renderingboard_widget.rs- Chess board widget for ratatuitheme.rs- Color theme management
diagnostics/- Performance diagnosticsmemory_profiler.rs- Memory allocation tracking and profiling
chess_position! Macro: Instantiates board state from ASCII art. Used extensively in tests:
let board = chess_position! {
rnbqkbnr
pppppppp
........
........
........
........
PPPPPPPP
RNBQKBNR
};Build Script: Cargo.toml specifies build = "precompile/src/main.rs". This runs the precompiler before compilation to generate Zobrist tables and magic numbers.
Release Profile Optimizations: Critical for performance testing (see Cargo.toml):
lto = true- Link-time optimizationcodegen-units = 1- Single codegen unit for better optimizationopt-level = 3- Maximum optimization
The codebase follows idiomatic Rust module organization practices:
mod.rsfiles: Only module declarations and re-exports, no implementation- Implementation files: Dedicated
.rsfiles (e.g.,board/board.rs, not inmod.rs) - Module docstrings: Required
//!comment at top of all module and implementation files
All imports follow a consistent ordering standard (enforced by the pre-commit hook):
- Standard library (
std::,core::) - External crates (
common,rayon,smallvec, etc.) - Crate imports (
crate::) - Relative imports (
super::,self::)
Separate groups with blank lines. Alphabetically sort within groups.
Example:
use std::io;
use std::str::FromStr;
use common::bitboard::Square;
use rayon::prelude::*;
use crate::board::Board;
use crate::chess_move::ChessMove;
use super::helper::Helper;Located at .git/hooks/pre-commit. Runs automatically before commits:
cargo fmt -- --check- Verify formattingcargo clippy -- -D warnings- Enforce zero warnings
If checks fail, commit is rejected. Fix issues before committing.
rustfmt.toml- Configuration file documenting the import style standard and other formatting rules.- Pre-commit hook (
.git/hooks/pre-commit) - Automatically runscargo fmt -- --checkandcargo clippy -- -D warningsbefore each commit to ensure code quality and consistency.
This section provides a systematic approach to profiling and optimizing codepaths in the chess engine. Use this as a template when optimizing any component.
macOS:
chess count-positions --depth 6 2>&1 &
CHESS_PID=$!
sleep 1
sample $CHESS_PID 30 -file /tmp/profile.txt
wait $CHESS_PID
tail -n 100 /tmp/profile.txt # See "Sort by top of stack" summaryLinux:
chess count-positions --depth 6 2>&1 &
CHESS_PID=$!
sleep 1
perf record -p $CHESS_PID sleep 30
perf reportAnalyze the "Sort by top of stack" section to identify hot functions.
The engine includes instrumentation for tracking allocations:
- Automatically tracks board clones and MoveGenerator allocations
- Output shown after
count-positionscommand - Add custom counters using
MemoryProfiler::record_*()methods insrc/diagnostics/memory_profiler.rs
Flamegraph:
sudo cargo flamegraph --bench pvp_benchmarkThe engine includes debug-level logging for monitoring lock contention in critical sections. These messages are useful for identifying synchronization bottlenecks:
Enable debug logging:
RUST_LOG=debug chess play --depth 4Key debug messages:
Slow killer get lock- Indicates when acquiring the killer moves lock takes >100µsSlow killer store lock- Indicates when acquiring the killer moves lock for storage takes >100µs
These messages help identify when thread contention on the killer_moves mutex is impacting performance. High frequency of these messages suggests the parallel search strategy may need adjustment or the killer move storage mechanism should be optimized.
Note: These messages are logged at debug level by default and won't appear in normal operation. Use RUST_LOG=debug to enable them during profiling.
The engine includes compile-time instrumentation to measure function-level latency. This shows exactly where time is spent across all hot codepaths.
Build with instrumentation:
cargo build --release --features instrumentationRun benchmark:
./target/release/chess benchmark-alpha-beta --depth 4Output shows:
- Function call counts (e.g.,
generate_moves: 8.3M calls) - Total time per function (e.g.,
quiescence_search: 4,868s) - Average time per call (e.g.,
apply: 1.22µs average)
Example output:
Function Calls Total (ms) Avg (µs)
--------------------------------------------------------------------------------
quiescence_search 536220 4867809.61 9078.01
alpha_beta_minimax 216718 4803848.15 22166.36
generate_moves 8317003 1275467.43 153.36
remove_invalid_moves 8317003 1216735.59 146.29
generate_attack_targets 336648619 454444.55 1.35
apply 321011382 392248.92 1.22
Overhead: Instrumentation adds ~13x overhead. Use this for understanding bottlenecks, not measuring absolute performance. Disable instrumentation for normal builds.
Measure performance before making changes:
chess count-positions --depth 5 | tail -n 1Record key metrics: positions/second, duration, memory profiler stats.
Quick feedback loops are essential: Use shallow depths (depth 4-5) during development for fast iteration. Deep profiling (depth 6+) should be reserved for final validation, as it takes significantly longer. The count-positions command provides immediate feedback on performance changes, allowing you to iterate quickly and validate optimizations before investing time in comprehensive profiling.
Use CPU profiling to find:
- Thread synchronization overhead (e.g.,
__psynch_cvwaiton macOS) - Excessive allocations (board clones, object creations)
- Algorithmic bottlenecks (not just hot functions)
Prioritize optimizations that:
- Reduce operation counts: Eliminate redundant allocations, avoid unnecessary cloning
- Improve parallelization strategy: Multi-depth parallelization, conditional cloning
- Share resources: Pass references instead of creating new instances
Critical: Always verify correctness before measuring performance:
cargo test # Must pass before measuring
chess count-positions --depth 5 | tail -n 1Compare results to baseline and verify position counts match exactly.
Maintain quick feedback cycles: After each optimization, run tests and a quick performance check. This allows you to:
- Catch regressions immediately
- Validate that optimizations actually improve performance
- Iterate rapidly without waiting for long-running benchmarks
- Build confidence before investing in deeper profiling
Only run comprehensive profiling (depth 6+, full CPU profiling) after you've validated improvements with quick checks.
- High sample counts ≠ slow functions: Functions called frequently may show high counts but be fast
- Compiler optimizations: Modern compilers with LTO already optimize many micro-operations
- Real bottlenecks: Often in resource allocation patterns, parallelization strategy, or algorithmic complexity
- Instrumentation is essential: Add counters to track allocations, clones, and operation counts
- System variability: ±10% variation is normal; measure multiple times for small improvements
- Always test performance changes with release builds (
cargo build --release) - Use benchmarks (
cargo bench) to measure impact of optimizations - The
pvp_benchmarksimulates engine-vs-engine gameplay for realistic profiling