Skip to content

Commit eb69709

Browse files
authored
Merge pull request #89 from ArcInstitute/setup-claude-md
feat: added a claude md for the root
2 parents f8239e6 + 20b2df6 commit eb69709

1 file changed

Lines changed: 57 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Build Commands
6+
7+
```sh
8+
cargo build # debug build
9+
cargo test # run all tests
10+
cargo test --release # run all tests in release mode
11+
cargo fmt --check # check formatting
12+
cargo clippy # lint
13+
cargo test <test_name> # run a single test by name
14+
cargo run --release --example <name> -- <args> # run an example
15+
```
16+
17+
Available examples: read, write, auto-write, grep, parallel_range, streaming, network_streaming. Test data lives in `data/` (subset.bq, subset.vbq, subset.cbq, subset_R1.fastq.gz, subset_R2.fastq.gz).
18+
19+
## Architecture
20+
21+
binseq is a library (no binary targets) for reading and writing binary DNA sequence file formats. The CLI tool is in a separate repo (`bqtools`).
22+
23+
### Three format variants
24+
25+
Each lives in its own module with a reader and writer:
26+
27+
- **BQ** (`src/bq/`) — Fixed-length records, 2-bit nucleotide encoding, no quality scores. Simplest and most compact.
28+
- **VBQ** (`src/vbq/`) — Variable-length records, row-based blocks, optional quality scores and headers, zstd compression, embedded index for random access.
29+
- **CBQ** (`src/cbq/`) — Variable-length records, columnar block storage, zstd compression, tracks N bases natively with Elias-Fano encoding. Recommended format.
30+
31+
### Unified API
32+
33+
- `BinseqReader` enum (`src/parallel.rs`) — dispatches over Bq/Vbq/Cbq MmapReaders for reading via memory-mapped I/O.
34+
- `BinseqWriter` enum + `BinseqWriterBuilder` (`src/write.rs`) — dispatches over Bq/Vbq/Cbq writers with a builder for configuration.
35+
36+
### Key traits and types
37+
38+
- `BinseqRecord` (`src/record/binseq_record.rs`) — trait for reading records; implemented by each format's RefRecord.
39+
- `SequencingRecord` + `SequencingRecordBuilder` (`src/record/sequencing_record.rs`) — zero-copy record type for writing, uses borrowed references.
40+
- `ParallelReader` / `ParallelProcessor` (`src/parallel.rs`) — traits for parallel range-based processing across threads.
41+
- `Policy` (`src/policy.rs`) — how to handle invalid nucleotides (ignore, break, random draw, set to specific base).
42+
- Error hierarchy (`src/error.rs`) — `thiserror`-based enums: HeaderError, ReadError, WriteError, CbqError, BuilderError, IndexError, ExtensionError.
43+
44+
### Dependencies of note
45+
46+
- `bitnuc` — nucleotide 2-bit and 4-bit encoding/decoding.
47+
- `paraseq` — parallel FASTX file parsing (optional, enabled by default).
48+
- `zstd` — block-level compression for VBQ and CBQ.
49+
- `memmap2` — memory-mapped file reading.
50+
51+
## Conventions
52+
53+
- Rust edition 2024.
54+
- Clippy pedantic enabled (`cast_possible_truncation` and `missing_errors_doc` allowed).
55+
- Native CPU target set in `.cargo/config.toml` (`-C target-cpu=native`).
56+
- All tests are inline `#[cfg(test)]` modules — no separate `tests/` directory.
57+
- Default features: `paraseq` and `anyhow`. FASTX encoding utilities (`src/utils/fastx.rs`) require the `paraseq` feature.

0 commit comments

Comments
 (0)