|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Build Commands |
| 6 | + |
| 7 | +```sh |
| 8 | +cargo build # debug build |
| 9 | +cargo test # run all tests |
| 10 | +cargo test --release # run all tests in release mode |
| 11 | +cargo fmt --check # check formatting |
| 12 | +cargo clippy # lint |
| 13 | +cargo test <test_name> # run a single test by name |
| 14 | +cargo run --release --example <name> -- <args> # run an example |
| 15 | +``` |
| 16 | + |
| 17 | +Available examples: read, write, auto-write, grep, parallel_range, streaming, network_streaming. Test data lives in `data/` (subset.bq, subset.vbq, subset.cbq, subset_R1.fastq.gz, subset_R2.fastq.gz). |
| 18 | + |
| 19 | +## Architecture |
| 20 | + |
| 21 | +binseq is a library (no binary targets) for reading and writing binary DNA sequence file formats. The CLI tool is in a separate repo (`bqtools`). |
| 22 | + |
| 23 | +### Three format variants |
| 24 | + |
| 25 | +Each lives in its own module with a reader and writer: |
| 26 | + |
| 27 | +- **BQ** (`src/bq/`) — Fixed-length records, 2-bit nucleotide encoding, no quality scores. Simplest and most compact. |
| 28 | +- **VBQ** (`src/vbq/`) — Variable-length records, row-based blocks, optional quality scores and headers, zstd compression, embedded index for random access. |
| 29 | +- **CBQ** (`src/cbq/`) — Variable-length records, columnar block storage, zstd compression, tracks N bases natively with Elias-Fano encoding. Recommended format. |
| 30 | + |
| 31 | +### Unified API |
| 32 | + |
| 33 | +- `BinseqReader` enum (`src/parallel.rs`) — dispatches over Bq/Vbq/Cbq MmapReaders for reading via memory-mapped I/O. |
| 34 | +- `BinseqWriter` enum + `BinseqWriterBuilder` (`src/write.rs`) — dispatches over Bq/Vbq/Cbq writers with a builder for configuration. |
| 35 | + |
| 36 | +### Key traits and types |
| 37 | + |
| 38 | +- `BinseqRecord` (`src/record/binseq_record.rs`) — trait for reading records; implemented by each format's RefRecord. |
| 39 | +- `SequencingRecord` + `SequencingRecordBuilder` (`src/record/sequencing_record.rs`) — zero-copy record type for writing, uses borrowed references. |
| 40 | +- `ParallelReader` / `ParallelProcessor` (`src/parallel.rs`) — traits for parallel range-based processing across threads. |
| 41 | +- `Policy` (`src/policy.rs`) — how to handle invalid nucleotides (ignore, break, random draw, set to specific base). |
| 42 | +- Error hierarchy (`src/error.rs`) — `thiserror`-based enums: HeaderError, ReadError, WriteError, CbqError, BuilderError, IndexError, ExtensionError. |
| 43 | + |
| 44 | +### Dependencies of note |
| 45 | + |
| 46 | +- `bitnuc` — nucleotide 2-bit and 4-bit encoding/decoding. |
| 47 | +- `paraseq` — parallel FASTX file parsing (optional, enabled by default). |
| 48 | +- `zstd` — block-level compression for VBQ and CBQ. |
| 49 | +- `memmap2` — memory-mapped file reading. |
| 50 | + |
| 51 | +## Conventions |
| 52 | + |
| 53 | +- Rust edition 2024. |
| 54 | +- Clippy pedantic enabled (`cast_possible_truncation` and `missing_errors_doc` allowed). |
| 55 | +- Native CPU target set in `.cargo/config.toml` (`-C target-cpu=native`). |
| 56 | +- All tests are inline `#[cfg(test)]` modules — no separate `tests/` directory. |
| 57 | +- Default features: `paraseq` and `anyhow`. FASTX encoding utilities (`src/utils/fastx.rs`) require the `paraseq` feature. |
0 commit comments