This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
bqtools is a Rust CLI for working with BINSEQ files — a binary format family for high-performance DNA sequence processing. It encodes, decodes, greps, concatenates, samples, and pipes BINSEQ files (.bq, .vbq, .cbq). CBQ is the recommended format for most applications.
cargo build # Debug build
cargo build --release # Optimized build (uses LTO, slow)
cargo install --path . # Install binary locally
cargo test --verbose # Run all tests
cargo test --verbose -F fuzzy # Run tests including fuzzy feature
cargo test <test_name> # Run a single test by name
cargo fmt --check # Check formatting
cargo clippy --verbose # Lint (pedantic clippy enabled)Logging is controlled via BQTOOLS_LOG environment variable (uses env_logger).
htslib(default): SAM/BAM/CRAM support via rust-htslibgcs(default): Google Cloud Storage file readingfuzzy(optional): Fuzzy matching viasassy— requiresRUSTFLAGS="-C target-cpu=native"
Build without defaults: cargo build --no-default-features -F fuzzy,gcs
src/cli/— Clap derive-based argument definitions.cli.rshas the top-levelCommandsenum.input.rsandoutput.rshandle complex input/output argument parsing (file formats, compression, paired-end, spans).src/commands/— Command implementations, each in its own subdirectory.utils.rshas shared compression helpers.src/types.rs— Type aliases (BoxedReader,BoxedWriter).src/main.rs— CLI dispatch and SIGPIPE handling.
Parallel processing: Commands use the paraseq crate's ParallelProcessor trait for embarrassingly parallel batch processing. Each command has a processor.rs implementing this trait with thread-local buffers and Arc<Mutex<T>> for shared global state.
Grep backends: The grep command uses a PatternMatcher enum dispatching to three backends — regex, aho-corasick (fixed-string, multi-pattern), and sassy (fuzzy, feature-gated). The same pattern applies to PatternCounter for the -P pattern-count mode. All backends accept PatternCollection which carries optional pattern names (from FASTA headers).
Pattern types: patterns.rs defines Pattern (name + sequence) and PatternCollection (newtype over Vec<Pattern>) with methods .bytes(), .regexes(), .names(). Pattern files (--file, --sfile, --xfile) auto-detect FASTA vs plain text. FASTA headers become pattern names; plain text patterns have no name and fall back to the pattern string in output.
Encode modes: Encoding dispatches across atomic (single/paired files), recursive (directory walk via walkdir), manifest (file list), and batch (multi-file thread distribution) modes.
Writer abstraction: SplitWriter supports interleaved (single file) and split (separate R1/R2) output modes with polymorphic writers (file, stdout, compressed, chunked).
| Crate | Role |
|---|---|
binseq |
BINSEQ format read/write |
bitnuc |
2-bit/4-bit nucleotide encoding |
paraseq |
Parallel FASTX/BINSEQ processing |
clap |
CLI argument parsing (derive) |
anyhow |
Error handling throughout |
Integration tests live in tests/. tests/common.rs provides a builder (write_fastx()) for generating random FASTQ/FASTA test data with configurable compression (none, gzip, zstd). Tests use cartesian products over format/compression/mode combinations. Dev dependencies: bon (builder macro), nucgen (random sequences), tempfile, itertools.
Random FASTQ/FASTA test data can be created on the CLI with nucgen (cargo install nucgen if not already installed).
# generates 10,000 reads of length 150
nucgen -n 10000 -l 150 some.fq
# generates 30,000 paired-reads of length 50 and 200
nucgen -n 30000 -l 50 -L 200 some_R1.fq some_R2.fqThese can then be ingested with bqtools encode:
bqtools encode some.fq -o some.cbq
bqtools encode some_R1.fq some_R2.fq -o some.cbqMake use of hyperfine (cargo install hyperfine if not already installed) to measure performance of binaries after changes.
# Measures decoding performance
hyperfine --warmup 3 --runs 10 "bqtools decode some.cbq > /dev/null"When making changes, keep the following documentation in sync:
- CLAUDE.md — Update this file when adding new commands, changing architecture, or modifying build/test workflows.
- README.md — Update usage examples and feature descriptions when adding or changing user-facing functionality (new commands, flags, behavior changes).
- Clap doc comments — All CLI arguments, flags, and subcommands use clap derive macros with
/// doc commentsand#[clap(long_about)]attributes. When adding or modifying flags, write clear help text directly on the struct fields insrc/cli/. These doc comments are the--helpoutput users see. - New feature flags — If adding a Cargo feature flag, document it in both
CLAUDE.md(Feature Flags section) andREADME.md(Feature Flags / Installation section).