|
| 1 | +# HED-Python Developer Instructions |
| 2 | + |
| 3 | +> **Local environment**: If `.status/local-environment.md` exists in the repository root, read it first — it contains machine-specific shell, OS, and venv details (e.g. Windows/PowerShell vs Linux/bash) that override the generic commands shown here. |
| 4 | +
|
| 5 | +## Code style |
| 6 | + |
| 7 | +- Google-style docstrings; use `Parameters:` not `Args:` |
| 8 | +- Line length: 120 characters (configured in `pyproject.toml`) |
| 9 | +- Markdown headers use sentence case: capitalize only the first word (and proper nouns/acronyms) |
| 10 | +- When creating work summaries, place them in `.status/` at the repository root |
| 11 | + |
| 12 | +## Project overview |
| 13 | + |
| 14 | +HED (Hierarchical Event Descriptors) is a framework for systematically describing events and experimental metadata. This Python repository (`hed-python`) provides the core **hedtools** package for validation, analysis, and transformation of HED-annotated datasets. HED is integrated into two major neuroimaging standards: BIDS (Brain Imaging Data Structure) and NWB (Neurodata Without Borders). |
| 15 | + |
| 16 | +### Related repositories |
| 17 | + |
| 18 | +- **[hed-schemas](https://github.com/hed-standard/hed-schemas)**: Standardized vocabularies (HED schemas) in XML/MediaWiki/OWL formats |
| 19 | +- **[hed-specification](https://github.com/hed-standard/hed-specification)**: Formal specification defining HED annotation rules |
| 20 | +- **[hed-examples](https://github.com/hed-standard/hed-examples)**: Example datasets and use cases (used as submodule in `spec_tests/`) |
| 21 | + |
| 22 | +### Package distribution |
| 23 | + |
| 24 | +- **PyPI Package**: `hedtools` (install via `pip install hedtools`) |
| 25 | +- **Python Version**: 3.10+ required |
| 26 | +- **Online Tools**: [hedtools.org](https://hedtools.org) for web-based validation/transformation |
| 27 | + |
| 28 | +## Architecture & core components |
| 29 | + |
| 30 | +### Three-layer architecture |
| 31 | + |
| 32 | +1. **Models Layer** (`hed/models/`): Core data structures |
| 33 | + |
| 34 | + - `HedString`: Parsed HED tag strings with schema validation |
| 35 | + - `HedTag`: Individual HED tags with canonical forms |
| 36 | + - `HedGroup`: Parenthesized tag groups |
| 37 | + - `HedSchema`: Schema definitions loaded from XML/MediaWiki/OWL |
| 38 | + - `TabularInput`: BIDS-compliant tabular data files with sidecar integration |
| 39 | + - `SpreadsheetInput`: Excel/TSV file handling |
| 40 | + - `Sidecar`: JSON metadata files mapping event codes to HED tags |
| 41 | + - `DefinitionDict`: Manages HED definitions from annotations |
| 42 | + - `QueryHandler`: Search/query interface for HED annotations |
| 43 | + |
| 44 | +2. **Validation Layer** (`hed/validator/`): |
| 45 | + |
| 46 | + - `HedValidator`: Core tag validation against schema rules |
| 47 | + - `SidecarValidator`: JSON sidecar validation |
| 48 | + - `SpreadsheetValidator`: TSV/Excel validation with BIDS compliance |
| 49 | + - `DefValidator`: Definition/Def-expand tag validation |
| 50 | + - `OnsetValidator`: Temporal onset/offset/duration validation |
| 51 | + |
| 52 | +3. **Tools Layer** (`hed/tools/`): |
| 53 | + |
| 54 | + - **BIDS** (`bids/`): Dataset discovery, file grouping, inheritance handling |
| 55 | + - **Analysis** (`analysis/`): Event summarization, type analysis, temporal processing, tag counting |
| 56 | + - **Remodeling** (`remodeling/`): Transformation operations on tabular data |
| 57 | + - **Util** (`util/`): Shared utilities for data manipulation |
| 58 | + |
| 59 | +### Key data flow patterns |
| 60 | + |
| 61 | +**Schema Loading & Caching**: |
| 62 | + |
| 63 | +```python |
| 64 | +# Always use schema loading utilities from hed.schema |
| 65 | +from hed.schema import load_schema_version, load_schema |
| 66 | +from hed import HedSchema |
| 67 | + |
| 68 | +# Load specific version (auto-cached in ~/.hedtools/) |
| 69 | +schema = load_schema_version("8.4.0") |
| 70 | + |
| 71 | +# Load from local file |
| 72 | +schema = load_schema("path/to/schema.xml") |
| 73 | +``` |
| 74 | + |
| 75 | +**HED String Processing**: |
| 76 | + |
| 77 | +```python |
| 78 | +# Standard pattern: parse → validate → analyze |
| 79 | +from hed import HedString, HedValidator, DefinitionDict |
| 80 | + |
| 81 | +hed_string = HedString("Event, Action/Button-press", schema) |
| 82 | +def_dict = DefinitionDict() # For definitions if needed |
| 83 | +issues = HedValidator(schema).validate(hed_string, def_dict) |
| 84 | +``` |
| 85 | + |
| 86 | +**BIDS Integration**: |
| 87 | + |
| 88 | +```python |
| 89 | +# Use TabularInput for BIDS-compliant processing |
| 90 | +from hed import TabularInput |
| 91 | + |
| 92 | +tabular = TabularInput(events_file, sidecar=json_file) |
| 93 | +def_dict = tabular.get_def_dict(schema) # Extract definitions |
| 94 | +issues = tabular.validate(schema) # Validate entire file |
| 95 | +``` |
| 96 | + |
| 97 | +**Query/Search Operations**: |
| 98 | + |
| 99 | +```python |
| 100 | +# Use QueryHandler for searching HED annotations |
| 101 | +from hed import QueryHandler, get_query_handlers |
| 102 | + |
| 103 | +query = QueryHandler("Event and Action") |
| 104 | +search_results = query.search(hed_string) |
| 105 | +``` |
| 106 | + |
| 107 | +## Development workflows |
| 108 | + |
| 109 | +### Testing strategy |
| 110 | + |
| 111 | +- Use `unittest` framework exclusively (not pytest) |
| 112 | +- Test structure: `tests/` mirrors `hed/` package structure |
| 113 | +- Run tests via VS Code tasks or PowerShell: |
| 114 | + - All tests: `.venv\Scripts\python.exe -m unittest discover tests -v` |
| 115 | + - Spec tests: `.venv\Scripts\python.exe -m unittest discover spec_tests -v` |
| 116 | + - Individual test: `.venv\Scripts\python.exe -m unittest tests.models.test_hed_string.TestHedStrings.test_constructor` |
| 117 | +- Test data stored in `tests/data/` subdirectories |
| 118 | + |
| 119 | +### Schema integration |
| 120 | + |
| 121 | +- Schemas auto-downloaded and cached in `~/.hedtools/` (cross-platform) |
| 122 | +- Local schema copies bundled in releases for offline use |
| 123 | +- Test schemas in `tests/data/schema_tests/` for development |
| 124 | +- Always validate against multiple schema versions in tests |
| 125 | +- Schema formats: XML, MediaWiki, OWL (all equivalent internally) |
| 126 | + |
| 127 | +### Error handling conventions |
| 128 | + |
| 129 | +- Use `ErrorHandler` class for collecting validation issues |
| 130 | +- Return structured error dictionaries, never raise for validation failures |
| 131 | +- Log with `HedLogger` for debugging, not print statements |
| 132 | +- Error codes defined in `hed/errors/error_types.py` and reference `hed-specification` repository |
| 133 | +- Error messages in `hed/errors/error_messages.py` and `hed/errors/schema_error_messages.py` |
| 134 | + |
| 135 | +## BIDS-specific patterns |
| 136 | + |
| 137 | +### File discovery & inheritance |
| 138 | + |
| 139 | +```python |
| 140 | +# Use BidsDataset for proper BIDS traversal with inheritance |
| 141 | +from hed.tools.bids import BidsDataset |
| 142 | + |
| 143 | +dataset = BidsDataset(root_path) |
| 144 | +for file_group in dataset.iter_file_groups(["events"]): |
| 145 | + # file_group handles inheritance automatically |
| 146 | + tabular_file = file_group.get_tabular_file() |
| 147 | +``` |
| 148 | + |
| 149 | +### Sidecar inheritance chain |
| 150 | + |
| 151 | +- BIDS inheritance: dataset → subject → session → file level |
| 152 | +- Use `BidsFileGroup` to handle inheritance automatically |
| 153 | +- Never manually resolve inheritance - use built-in mechanisms |
| 154 | + |
| 155 | +## Remodeling operations architecture |
| 156 | + |
| 157 | +Located in `hed/tools/remodeling/operations/`: |
| 158 | + |
| 159 | +- All operations inherit from `BaseOp` |
| 160 | +- Define `PARAMS` JSON schema for validation |
| 161 | +- Implement `do_op(dispatcher, df, name, sidecar=None)` method |
| 162 | +- Use `Dispatcher` class to orchestrate multi-step transformations |
| 163 | +- Operations are JSON-configurable for reproducible analysis pipelines |
| 164 | + |
| 165 | +## Development environment |
| 166 | + |
| 167 | +### Setup |
| 168 | + |
| 169 | +**Always** install in editable mode and activate the virtual environment before running any commands. See `.status/local-environment.md` for OS-specific activation and command syntax. |
| 170 | + |
| 171 | +```bash |
| 172 | +# Generic (adjust path separators / activation script for your OS) |
| 173 | +pip install -e ".[dev,test,docs,examples]" |
| 174 | +``` |
| 175 | + |
| 176 | +### Package structure |
| 177 | + |
| 178 | +- Entry point: `hed/__init__.py` exports main user API |
| 179 | +- Unified CLI entry point: `hedpy` → `hed/cli/cli.py` |
| 180 | +- Legacy CLI scripts in `hed/scripts/` (deprecated — prefer `hedpy`) |
| 181 | +- Version managed by `setuptools-scm`; `hed/_version.py` is auto-generated — do not edit |
| 182 | +- Configuration: `pyproject.toml` (build, ruff, typos, setuptools) |
| 183 | + |
| 184 | +### Dependencies |
| 185 | + |
| 186 | +- Python 3.10+ required; declared in `pyproject.toml` |
| 187 | +- Core: `pandas<3.0`, `numpy>=2`, `defusedxml`, `portalocker`, `click`, `semantic-version`, `inflect`, `openpyxl` |
| 188 | +- Dev extras: `ruff`, `typos`, `mdformat`; install with `pip install -e ".[dev]"` |
| 189 | + |
| 190 | +### Linting and formatting |
| 191 | + |
| 192 | +Run before every commit — these are enforced by CI: |
| 193 | + |
| 194 | +```bash |
| 195 | +# Check for lint errors |
| 196 | +ruff check hed/ tests/ |
| 197 | + |
| 198 | +# Check formatting |
| 199 | +ruff format --check hed/ tests/ |
| 200 | + |
| 201 | +# Auto-fix lint + format |
| 202 | +ruff check --fix --unsafe-fixes hed/ tests/ |
| 203 | +ruff format hed/ tests/ |
| 204 | + |
| 205 | +# Spell check (excludes tests/, yaml, json, xml — see pyproject.toml [tool.typos]) |
| 206 | +typos |
| 207 | +``` |
| 208 | + |
| 209 | +Ruff rules and line length (120) are configured in `pyproject.toml` under `[tool.ruff]`. |
| 210 | + |
| 211 | +### Running tests |
| 212 | + |
| 213 | +```bash |
| 214 | +# All unit tests |
| 215 | +python -m unittest discover tests -v |
| 216 | + |
| 217 | +# Spec-compliance tests (requires git submodules: spec_tests/hed-tests, hed-examples, hed-schemas) |
| 218 | +python -m unittest discover spec_tests -v |
| 219 | + |
| 220 | +# Single test |
| 221 | +python -m unittest tests.models.test_hed_string.TestHedStrings.test_constructor |
| 222 | +``` |
| 223 | + |
| 224 | +### CI/CD pipeline (`.github/workflows/`) |
| 225 | + |
| 226 | +| Workflow | File | Trigger | Purpose | |
| 227 | +| ------------- | --------------------- | --------------------------- | ----------------------------------------------------------------------- | |
| 228 | +| Tests | `ci.yaml` | push/PR to any branch | Python 3.10–3.14 on Ubuntu (main branch); 3.10 & 3.13 on other branches | |
| 229 | +| Coverage | `ci_cov.yaml` | push to main only | Coverage report, Python 3.10 | |
| 230 | +| Windows tests | `ci_windows.yaml` | push/PR to main | Python 3.10–3.12 on Windows | |
| 231 | +| Ruff | `ruff.yaml` | push/PR to main | Lint + format check | |
| 232 | +| Typos | `typos.yaml` | push/PR to main | Spelling check | |
| 233 | +| Spec tests | `spec_tests.yaml` | push/PR to any branch | HED specification compliance, Python 3.10 | |
| 234 | +| Docs | `docs.yaml` | push/PR to main | Sphinx build | |
| 235 | +| Notebooks | `notebook_tests.yaml` | push/PR to main | Jupyter notebook execution | |
| 236 | +| Links | `links.yaml` | scheduled + manual dispatch | Dead-link checker (lychee) | |
| 237 | + |
| 238 | +To replicate CI locally, run `ruff check`, `ruff format --check`, `typos`, and `python -m unittest discover tests -v` before pushing. |
| 239 | + |
| 240 | +## Schema development notes |
| 241 | + |
| 242 | +- Schemas support multiple formats: (MediaWiki, XML, OWL) but all schema formats are equivalent and are loaded into the same internal representation. |
| 243 | +- Library schemas can be merged with base schema if they have the withStandard attribute in their header. |
| 244 | +- Schema validation includes attribute checking, unit validation |
| 245 | +- Use `HedSchemaGroup` for multi-schema validation scenarios |
| 246 | +- Schema I/O handled by modules in `hed/schema/schema_io/` |
| 247 | + |
| 248 | +## Common pitfalls to avoid |
| 249 | + |
| 250 | +- Don't use hardcoded schema versions in production code |
| 251 | +- Don't modify schemas in-place — they're cached/shared across processes and are immutable |
| 252 | +- Always activate the virtual environment before running Python/pip commands |
| 253 | +- Check `.status/local-environment.md` for shell-specific command syntax (e.g. PowerShell vs bash) |
| 254 | +- Don't mix pytest and unittest — this project uses `unittest` exclusively |
| 255 | +- Always use absolute imports from `hed` package, not relative imports |
| 256 | +- `hed/_version.py` is auto-generated by `setuptools-scm` — never edit it manually |
| 257 | +- `spec_tests/` contains git submodules; run `git submodule update --init --recursive` if spec tests fail to find data |
0 commit comments