|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +VRS-Python is the reference implementation of the GA4GH Variation Representation Specification (VRS), providing Python language support for representing genomic variation data consistently and uniquely. |
| 8 | + |
| 9 | +## Development Commands |
| 10 | + |
| 11 | +### Environment Setup |
| 12 | +- `make devready` - Create Python 3.12 venv, install dev dependencies, setup pre-commit hooks |
| 13 | +- `make nbready` - Create Python 3.12 venv, install notebook dependencies for Jupyter usage |
| 14 | +- `source venv/3.12/bin/activate` - Activate the virtual environment (required before development) |
| 15 | + |
| 16 | +### Development Tasks |
| 17 | +- `make test` - Run pytest test suite |
| 18 | +- `make lint` - Static analysis with ruff (auto-fix enabled) |
| 19 | +- `make format` - Code formatting with ruff |
| 20 | +- `pytest tests/path/to/specific_test.py` - Run individual test files |
| 21 | +- `pytest tests/path/to/specific_test.py::test_function_name` - Run specific test functions |
| 22 | + |
| 23 | +### Docker Services |
| 24 | +- `docker-compose up` - Start external dependencies (seqrepo-rest-service on port 5000, UTA database on port 5432) |
| 25 | +- Required for full functionality of VRS translation and normalization features |
| 26 | + |
| 27 | +### Package Installation |
| 28 | +- `pip install -e .[dev,extras,notebooks]` - Install in development mode with all dependencies |
| 29 | +- `pip install -e .[extras]` - Install with core extras (SeqRepo, HGVS tools, etc.) |
| 30 | + |
| 31 | +## Code Architecture |
| 32 | + |
| 33 | +### Core Components |
| 34 | +- **`src/ga4gh/vrs/models.py`** - Pydantic models for VRS objects (Allele, Location, etc.) |
| 35 | +- **`src/ga4gh/vrs/normalize.py`** - Allele normalization algorithms per VRS specification |
| 36 | +- **`src/ga4gh/vrs/dataproxy.py`** - Abstract interface and SeqRepo implementation for sequence data access |
| 37 | +- **`src/ga4gh/vrs/enderef.py`** - Converting between inlined and referenced VRS object forms |
| 38 | + |
| 39 | +### Key Modules |
| 40 | +- **`src/ga4gh/vrs/extras/translator.py`** - Translates between VRS and external formats (HGVS, SPDI, gnomAD, Beacon) |
| 41 | +- **`src/ga4gh/vrs/extras/annotator/`** - VCF annotation tools with VRS identifiers |
| 42 | +- **`src/ga4gh/vrs/utils/hgvs_tools.py`** - HGVS parsing and validation utilities |
| 43 | +- **`src/ga4gh/core/`** - Core GA4GH models and identifier generation |
| 44 | + |
| 45 | +### Testing Structure |
| 46 | +- **`tests/`** - Main test directory with pytest configuration |
| 47 | +- **`tests/validation/`** - VRS specification compliance tests |
| 48 | +- **`tests/extras/`** - Tests for translator and annotator functionality |
| 49 | +- **`tests/cassettes/`** - VCR.py cassettes for external API mocking |
| 50 | + |
| 51 | +## Development Notes |
| 52 | + |
| 53 | +### Dependencies |
| 54 | +- Requires Python >= 3.10 (Python 3.12 required for development) |
| 55 | +- External services: SeqRepo (sequence data), UTA (transcript alignments) |
| 56 | +- Key libraries: Pydantic 2.x, bioutils, HGVS, requests |
| 57 | + |
| 58 | +### Testing Environment |
| 59 | +- Uses pytest with coverage reporting |
| 60 | +- VCR.py for API response caching |
| 61 | +- Test data includes local SeqRepo instance at `tests/data/seqrepo/` |
| 62 | +- Set `SEQREPO_ROOT_DIR=tests/data/seqrepo/latest` for tests |
| 63 | + |
| 64 | +### Pre-commit Configuration |
| 65 | +- Ruff linting and formatting |
| 66 | +- Automatic code quality checks before commits |
| 67 | +- Install with `pre-commit install` (done automatically by `make develop`) |
0 commit comments