Skip to content

Commit 9c70089

Browse files
authored
Cleaned up actions and tool configuration
Cleaned up actions and tool configuration
2 parents 3bf4ff6 + 1197ab1 commit 9c70089

6 files changed

Lines changed: 299 additions & 48 deletions

File tree

.gitattributes

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
hed/_version.py export-subst
2-
31
# Set default behavior to automatically normalize line endings to LF
42
* text=auto eol=lf
53

.github/copilot-instructions.md

Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
# HED-Python Developer Instructions
2+
3+
> **Local environment**: If `.status/local-environment.md` exists in the repository root, read it first — it contains machine-specific shell, OS, and venv details (e.g. Windows/PowerShell vs Linux/bash) that override the generic commands shown here.
4+
5+
## Code style
6+
7+
- Google-style docstrings; use `Parameters:` not `Args:`
8+
- Line length: 120 characters (configured in `pyproject.toml`)
9+
- Markdown headers use sentence case: capitalize only the first word (and proper nouns/acronyms)
10+
- When creating work summaries, place them in `.status/` at the repository root
11+
12+
## Project overview
13+
14+
HED (Hierarchical Event Descriptors) is a framework for systematically describing events and experimental metadata. This Python repository (`hed-python`) provides the core **hedtools** package for validation, analysis, and transformation of HED-annotated datasets. HED is integrated into two major neuroimaging standards: BIDS (Brain Imaging Data Structure) and NWB (Neurodata Without Borders).
15+
16+
### Related repositories
17+
18+
- **[hed-schemas](https://github.com/hed-standard/hed-schemas)**: Standardized vocabularies (HED schemas) in XML/MediaWiki/OWL formats
19+
- **[hed-specification](https://github.com/hed-standard/hed-specification)**: Formal specification defining HED annotation rules
20+
- **[hed-examples](https://github.com/hed-standard/hed-examples)**: Example datasets and use cases (used as submodule in `spec_tests/`)
21+
22+
### Package distribution
23+
24+
- **PyPI Package**: `hedtools` (install via `pip install hedtools`)
25+
- **Python Version**: 3.10+ required
26+
- **Online Tools**: [hedtools.org](https://hedtools.org) for web-based validation/transformation
27+
28+
## Architecture & core components
29+
30+
### Three-layer architecture
31+
32+
1. **Models Layer** (`hed/models/`): Core data structures
33+
34+
- `HedString`: Parsed HED tag strings with schema validation
35+
- `HedTag`: Individual HED tags with canonical forms
36+
- `HedGroup`: Parenthesized tag groups
37+
- `HedSchema`: Schema definitions loaded from XML/MediaWiki/OWL
38+
- `TabularInput`: BIDS-compliant tabular data files with sidecar integration
39+
- `SpreadsheetInput`: Excel/TSV file handling
40+
- `Sidecar`: JSON metadata files mapping event codes to HED tags
41+
- `DefinitionDict`: Manages HED definitions from annotations
42+
- `QueryHandler`: Search/query interface for HED annotations
43+
44+
2. **Validation Layer** (`hed/validator/`):
45+
46+
- `HedValidator`: Core tag validation against schema rules
47+
- `SidecarValidator`: JSON sidecar validation
48+
- `SpreadsheetValidator`: TSV/Excel validation with BIDS compliance
49+
- `DefValidator`: Definition/Def-expand tag validation
50+
- `OnsetValidator`: Temporal onset/offset/duration validation
51+
52+
3. **Tools Layer** (`hed/tools/`):
53+
54+
- **BIDS** (`bids/`): Dataset discovery, file grouping, inheritance handling
55+
- **Analysis** (`analysis/`): Event summarization, type analysis, temporal processing, tag counting
56+
- **Remodeling** (`remodeling/`): Transformation operations on tabular data
57+
- **Util** (`util/`): Shared utilities for data manipulation
58+
59+
### Key data flow patterns
60+
61+
**Schema Loading & Caching**:
62+
63+
```python
64+
# Always use schema loading utilities from hed.schema
65+
from hed.schema import load_schema_version, load_schema
66+
from hed import HedSchema
67+
68+
# Load specific version (auto-cached in ~/.hedtools/)
69+
schema = load_schema_version("8.4.0")
70+
71+
# Load from local file
72+
schema = load_schema("path/to/schema.xml")
73+
```
74+
75+
**HED String Processing**:
76+
77+
```python
78+
# Standard pattern: parse → validate → analyze
79+
from hed import HedString, HedValidator, DefinitionDict
80+
81+
hed_string = HedString("Event, Action/Button-press", schema)
82+
def_dict = DefinitionDict() # For definitions if needed
83+
issues = HedValidator(schema).validate(hed_string, def_dict)
84+
```
85+
86+
**BIDS Integration**:
87+
88+
```python
89+
# Use TabularInput for BIDS-compliant processing
90+
from hed import TabularInput
91+
92+
tabular = TabularInput(events_file, sidecar=json_file)
93+
def_dict = tabular.get_def_dict(schema) # Extract definitions
94+
issues = tabular.validate(schema) # Validate entire file
95+
```
96+
97+
**Query/Search Operations**:
98+
99+
```python
100+
# Use QueryHandler for searching HED annotations
101+
from hed import QueryHandler, get_query_handlers
102+
103+
query = QueryHandler("Event and Action")
104+
search_results = query.search(hed_string)
105+
```
106+
107+
## Development workflows
108+
109+
### Testing strategy
110+
111+
- Use `unittest` framework exclusively (not pytest)
112+
- Test structure: `tests/` mirrors `hed/` package structure
113+
- Run tests via VS Code tasks or PowerShell:
114+
- All tests: `.venv\Scripts\python.exe -m unittest discover tests -v`
115+
- Spec tests: `.venv\Scripts\python.exe -m unittest discover spec_tests -v`
116+
- Individual test: `.venv\Scripts\python.exe -m unittest tests.models.test_hed_string.TestHedStrings.test_constructor`
117+
- Test data stored in `tests/data/` subdirectories
118+
119+
### Schema integration
120+
121+
- Schemas auto-downloaded and cached in `~/.hedtools/` (cross-platform)
122+
- Local schema copies bundled in releases for offline use
123+
- Test schemas in `tests/data/schema_tests/` for development
124+
- Always validate against multiple schema versions in tests
125+
- Schema formats: XML, MediaWiki, OWL (all equivalent internally)
126+
127+
### Error handling conventions
128+
129+
- Use `ErrorHandler` class for collecting validation issues
130+
- Return structured error dictionaries, never raise for validation failures
131+
- Log with `HedLogger` for debugging, not print statements
132+
- Error codes defined in `hed/errors/error_types.py` and reference `hed-specification` repository
133+
- Error messages in `hed/errors/error_messages.py` and `hed/errors/schema_error_messages.py`
134+
135+
## BIDS-specific patterns
136+
137+
### File discovery & inheritance
138+
139+
```python
140+
# Use BidsDataset for proper BIDS traversal with inheritance
141+
from hed.tools.bids import BidsDataset
142+
143+
dataset = BidsDataset(root_path)
144+
for file_group in dataset.iter_file_groups(["events"]):
145+
# file_group handles inheritance automatically
146+
tabular_file = file_group.get_tabular_file()
147+
```
148+
149+
### Sidecar inheritance chain
150+
151+
- BIDS inheritance: dataset → subject → session → file level
152+
- Use `BidsFileGroup` to handle inheritance automatically
153+
- Never manually resolve inheritance - use built-in mechanisms
154+
155+
## Remodeling operations architecture
156+
157+
Located in `hed/tools/remodeling/operations/`:
158+
159+
- All operations inherit from `BaseOp`
160+
- Define `PARAMS` JSON schema for validation
161+
- Implement `do_op(dispatcher, df, name, sidecar=None)` method
162+
- Use `Dispatcher` class to orchestrate multi-step transformations
163+
- Operations are JSON-configurable for reproducible analysis pipelines
164+
165+
## Development environment
166+
167+
### Setup
168+
169+
**Always** install in editable mode and activate the virtual environment before running any commands. See `.status/local-environment.md` for OS-specific activation and command syntax.
170+
171+
```bash
172+
# Generic (adjust path separators / activation script for your OS)
173+
pip install -e ".[dev,test,docs,examples]"
174+
```
175+
176+
### Package structure
177+
178+
- Entry point: `hed/__init__.py` exports main user API
179+
- Unified CLI entry point: `hedpy``hed/cli/cli.py`
180+
- Legacy CLI scripts in `hed/scripts/` (deprecated — prefer `hedpy`)
181+
- Version managed by `setuptools-scm`; `hed/_version.py` is auto-generated — do not edit
182+
- Configuration: `pyproject.toml` (build, ruff, typos, setuptools)
183+
184+
### Dependencies
185+
186+
- Python 3.10+ required; declared in `pyproject.toml`
187+
- Core: `pandas<3.0`, `numpy>=2`, `defusedxml`, `portalocker`, `click`, `semantic-version`, `inflect`, `openpyxl`
188+
- Dev extras: `ruff`, `typos`, `mdformat`; install with `pip install -e ".[dev]"`
189+
190+
### Linting and formatting
191+
192+
Run before every commit — these are enforced by CI:
193+
194+
```bash
195+
# Check for lint errors
196+
ruff check hed/ tests/
197+
198+
# Check formatting
199+
ruff format --check hed/ tests/
200+
201+
# Auto-fix lint + format
202+
ruff check --fix --unsafe-fixes hed/ tests/
203+
ruff format hed/ tests/
204+
205+
# Spell check (excludes tests/, yaml, json, xml — see pyproject.toml [tool.typos])
206+
typos
207+
```
208+
209+
Ruff rules and line length (120) are configured in `pyproject.toml` under `[tool.ruff]`.
210+
211+
### Running tests
212+
213+
```bash
214+
# All unit tests
215+
python -m unittest discover tests -v
216+
217+
# Spec-compliance tests (requires git submodules: spec_tests/hed-tests, hed-examples, hed-schemas)
218+
python -m unittest discover spec_tests -v
219+
220+
# Single test
221+
python -m unittest tests.models.test_hed_string.TestHedStrings.test_constructor
222+
```
223+
224+
### CI/CD pipeline (`.github/workflows/`)
225+
226+
| Workflow | File | Trigger | Purpose |
227+
| ------------- | --------------------- | --------------------------- | ----------------------------------------------------------------------- |
228+
| Tests | `ci.yaml` | push/PR to any branch | Python 3.10–3.14 on Ubuntu (main branch); 3.10 & 3.13 on other branches |
229+
| Coverage | `ci_cov.yaml` | push to main only | Coverage report, Python 3.10 |
230+
| Windows tests | `ci_windows.yaml` | push/PR to main | Python 3.10–3.12 on Windows |
231+
| Ruff | `ruff.yaml` | push/PR to main | Lint + format check |
232+
| Typos | `typos.yaml` | push/PR to main | Spelling check |
233+
| Spec tests | `spec_tests.yaml` | push/PR to any branch | HED specification compliance, Python 3.10 |
234+
| Docs | `docs.yaml` | push/PR to main | Sphinx build |
235+
| Notebooks | `notebook_tests.yaml` | push/PR to main | Jupyter notebook execution |
236+
| Links | `links.yaml` | scheduled + manual dispatch | Dead-link checker (lychee) |
237+
238+
To replicate CI locally, run `ruff check`, `ruff format --check`, `typos`, and `python -m unittest discover tests -v` before pushing.
239+
240+
## Schema development notes
241+
242+
- Schemas support multiple formats: (MediaWiki, XML, OWL) but all schema formats are equivalent and are loaded into the same internal representation.
243+
- Library schemas can be merged with base schema if they have the withStandard attribute in their header.
244+
- Schema validation includes attribute checking, unit validation
245+
- Use `HedSchemaGroup` for multi-schema validation scenarios
246+
- Schema I/O handled by modules in `hed/schema/schema_io/`
247+
248+
## Common pitfalls to avoid
249+
250+
- Don't use hardcoded schema versions in production code
251+
- Don't modify schemas in-place — they're cached/shared across processes and are immutable
252+
- Always activate the virtual environment before running Python/pip commands
253+
- Check `.status/local-environment.md` for shell-specific command syntax (e.g. PowerShell vs bash)
254+
- Don't mix pytest and unittest — this project uses `unittest` exclusively
255+
- Always use absolute imports from `hed` package, not relative imports
256+
- `hed/_version.py` is auto-generated by `setuptools-scm` — never edit it manually
257+
- `spec_tests/` contains git submodules; run `git submodule update --init --recursive` if spec tests fail to find data

.gitignore

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,3 @@ Desktop.ini
138138
schema_cache_test/
139139
hed_cache/
140140
spec_tests/*.json
141-
142-
# GitHub Copilot instructions (project-specific)
143-
.github/copilot-instructions.md

.lycheeignore

Lines changed: 0 additions & 42 deletions
This file was deleted.

hed/schema/schema_io/hed_id_util.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ def _get_hedid_range(schema_name, df_key):
3030
3131
Parameters:
3232
schema_name(str): The known schema name with an assigned id range.
33-
df_key(str): The dataframe range type we're interested in.
33+
df_key(str): The dataframe section type. Must be a key in object_type_id_offset
34+
(STRUCT_KEY is not accepted and will raise NotImplementedError).
3435
3536
Returns:
3637
set: A set of all id's in the requested range.

lychee.toml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,48 @@ exclude_path = [
4444

4545
# Exclude specific URLs from checking (by regex)
4646
exclude = [
47+
# Local/internal URLs
4748
'^http://127\.0\.0\.',
4849
'^http://localhost',
4950
'^https://localhost',
5051
'^file://',
52+
53+
# ScienceDirect (require authentication/cookies)
54+
'https://www\.sciencedirect\.com/science/article/pii/S1053811921010387',
55+
'https://www\.sciencedirect\.com/science/article/pii/S0010945221001106',
56+
'https://www\.sciencedirect\.com/science/article/pii/S1388245717309069',
57+
58+
# Springer (503 errors but links work in browsers)
59+
'^https?://link\.springer\.com/',
60+
61+
# INCF (certificate/network issues)
62+
'^https?://.*\.incf\.org/',
63+
'^https?://incf\.org/',
64+
'^https?://neuroinformatics\.incf\.org/',
65+
66+
# DOI links that return 403 but work in browsers
67+
'^https?://doi\.org/10\.1111/epi\.18113',
68+
69+
# NPM (blocks automated requests)
70+
'^https?://.*\.npmjs\.com/package/hed-validator',
71+
72+
# MathWorks (blocks automated requests)
73+
'^https?://.*\.mathworks\.com/',
74+
75+
# Brain Meeting poster links (expired/removed)
76+
'^https?://brainmeeting.*\.ipostersessions\.com/',
77+
'^https?://globalbrainconsortium\.org/documents/GBC_March-2023_Agenda_Annual_Meeting\.pdf',
78+
79+
# CANCTA network (authentication/access issues)
80+
'^https?://.*\.cancta\.net/',
81+
'^https?://cancta\.net/',
82+
83+
# GitHub discussions (programmatic access blocked)
84+
'^https?://github\.com/hed-standard/hed-python/discussions',
85+
86+
# HED tools services (programmatic access blocked)
87+
'^https?://hedtools\.org/hed/services_submit',
88+
89+
# Internal anchor links (false positives from lychee)
90+
'(_anchor|-anchor)',
5191
]

0 commit comments

Comments
 (0)