Skip to content

Commit 8ca9bf7

Browse files
committed
Add comprehensive test suites for mapping accounting and identifier resolution
- Introduced tests for the mapping_accounting module, covering various scenarios including many-to-one mappings, collision detection, and mapping statistics. - Added tests for the orn_identifier module, focusing on normalization, candidate generation, and identifier resolution. - Implemented tests for receptor identifier normalization utilities and receptor inventory construction, ensuring proper handling of various input formats and schema validation.
1 parent 862c0f2 commit 8ca9bf7

41 files changed

Lines changed: 10635 additions & 293 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,13 +145,18 @@ Thumbs.db
145145
test_cache/
146146
test_data/
147147

148-
data/
148+
# Ignore bulk data by default, but allow publication-critical mapping artifacts.
149+
# NOTE: Do not ignore the `data/` directory itself, otherwise `!data/mappings/**`
150+
# cannot re-include tracked files.
151+
data/*
152+
153+
# Keep small, publication-critical mapping artifacts tracked
154+
!data/mappings/
155+
!data/mappings/**
149156

150157
helper-code/
151158

152159
output/
153160
behavioral_prediction_results/
154161

155-
tests/
156-
157-
flywire_mb_analysis/
162+
flywire_mb_analysis/

CHANGELOG.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,23 @@ All notable changes to this project will be documented in this file. The format
55
## [Unreleased]
66

77
### Added
8-
- _Placeholder_ – add new entries here.
8+
- Authoritative DoOR→FlyWire mapping system with provenance + strict validations (`src/door_toolkit/integration/door_to_flywire_mapping.py`).
9+
- Publication-critical mapping artifacts tracked under `data/mappings/`:
10+
- `data/mappings/door_to_flywire_mapping.csv`
11+
- `data/mappings/door_to_flywire_manual_overrides.csv`
12+
- `data/mappings/sensillum_to_receptor_reference.csv`
13+
- Mapping pipeline documentation: `docs/DOOR_TO_FLYWIRE_MAPPING.md`.
914

1015
### Changed
11-
- _Placeholder_ – record behaviour changes here.
16+
- `data/mappings/receptor_inventory.csv` is generated from the authoritative mapping artifact and defines “mapped” as “mapped to a valid `ORN_` FlyWire label” (not passthrough strings).
17+
- Adult-only filtering now follows DoOR.mappings (`adult=False`, `larva=True`) larval-only flags (DoOR 2.0; DOI: 10.1038/srep21841) throughout integration and inventory.
18+
- `.gitignore` now tracks `data/mappings/**` while continuing to ignore bulk `data/*`.
1219

1320
### Fixed
14-
- _Placeholder_ – document bug fixes here.
21+
- Corrected known mapping mismatches with explicit provenance:
22+
- `Or10a → ORN_DL1` (DoOR.mappings; Münch & Galizia 2016, DOI: 10.1038/srep21841)
23+
- `Ir64a.DC4 → ORN_DC4` and `Ir64a.DP1m → ORN_DP1m` (DoOR dotted-suffix convention)
24+
- Prevented ambiguous multi-glomerulus DoOR units (e.g., `DM5+DM3`, `DL2d/v`) from being silently treated as single mappings in adult analyses.
1525

1626
## [1.0.0] - 2025-12-17
1727

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,19 @@ Our analysis revealed:
251251
- **VM7v acts as convergence hub** receiving from multiple glomeruli
252252
- **Asymmetric connectivity** patterns suggesting specialized functions
253253

254+
### ORN/Glomerulus Identifier Resolution
255+
256+
The connectomics module includes a **robust identifier resolution system** that automatically normalizes messy ORN/glomerulus names and maps receptor names to their glomerulus names.
257+
258+
**Key features:**
259+
- **Format-agnostic**: Accepts `"DL3"`, `"dl3"`, `"ORN_DL3"`, `"ORN-DL3"`, `"Glomerulus DL3"` - all resolve to `"ORN_DL3"`
260+
- **Receptor-to-glomerulus mapping**: Automatically maps `"Or7a"``"ORN_DL5"`, `"Ir31a"``"ORN_VL2p"`, `"Gr21a"``"ORN_V"`
261+
- **Complete coverage**: Includes 44 receptors (33 Or, 10 Ir, 1 Gr) mapped to their FlyWire glomeruli
262+
- **Fuzzy matching**: Suggests alternatives when exact matches fail (ranked by similarity)
263+
- **Clear errors**: Provides actionable error messages with top 10 suggestions
264+
265+
In FlyWire, neurons are labeled by glomerulus name (e.g., `ORN_VL2p; Ir31a`), not receptor name. The resolver automatically handles this translation so you can use familiar receptor names like `"Ir31a"` or `"Or7a"` in your code. The system uses normalization (case-insensitive, separator-agnostic) combined with receptor mapping and fuzzy matching to prevent "non-matching ORN name" errors. All pathway analysis functions (`analyze_single_orn`, `compare_orn_pair`, `find_pathways`) accept both receptor names and glomerulus names. See [`examples/connectomics/example_orn_identifier_resolution.py`](examples/connectomics/example_orn_identifier_resolution.py) for a complete demonstration.
266+
254267
---
255268

256269
## FlyWire Integration
@@ -925,6 +938,48 @@ tracer.export_metrics_csv([metrics], "connectivity_metrics.csv")
925938
- `metrics.alpha_beta_fraction`: Fraction in appetitive lobe (0-1)
926939
- `metrics.circuit_score`: Overall connectivity score (0-1)
927940

941+
### Mapping Accounting
942+
943+
**IMPORTANT:** Prevents confusion between receptor counts and unique glomerulus counts in many-to-one mappings.
944+
945+
```python
946+
from door_toolkit.integration.mapping_accounting import (
947+
compute_mapping_stats,
948+
format_mapping_summary,
949+
log_mapping_stats,
950+
write_mapping_stats_json
951+
)
952+
953+
# Compute comprehensive mapping statistics
954+
mapping = {'OR82A': 'VA6', 'OR94A': 'VA6', 'OR7A': 'DL5'} # Example with collision
955+
stats = compute_mapping_stats(
956+
mapping,
957+
note="Example mapping",
958+
adult_only=False # Include larval receptors
959+
)
960+
961+
# Get compact summary
962+
summary = format_mapping_summary(stats)
963+
# "3 receptors → 2 unique glomeruli (1 collision)"
964+
965+
# Check for many-to-one collapses
966+
if stats['collision_count'] > 0:
967+
print(f"Collisions: {stats['collision_summary']}")
968+
# ['VA6: OR82A, OR94A']
969+
970+
# Write JSON artifact for reproducibility
971+
write_mapping_stats_json("mapping_stats.json", stats)
972+
```
973+
974+
**Key Stats Returned:**
975+
- `n_receptors_mapped`: Number of receptor genes successfully mapped
976+
- `n_unique_glomeruli_from_mapped_receptors`: Number of distinct glomeruli (may differ!)
977+
- `collision_count`: Number of glomeruli with ≥2 receptors (many-to-one)
978+
- `collisions`: Dict of glomerulus → [receptor list] for collisions
979+
- `collision_summary`: Human-readable collision descriptions
980+
981+
📚 **See:** [docs/RECEPTOR_GLOMERULUS_MAPPING_ACCOUNTING.md](docs/RECEPTOR_GLOMERULUS_MAPPING_ACCOUNTING.md) for complete documentation on preventing receptor vs glomerulus count confusion.
982+
928983
---
929984

930985
## Examples
@@ -941,6 +996,7 @@ Complete working examples are available in the `examples/` directory:
941996
- `examples/connectomics/example_2_orn_pair_comparison.py` - Mode 2: ORN pair comparison
942997
- `examples/connectomics/example_3_full_network_analysis.py` - Mode 3: Full network view
943998
- `examples/connectomics/example_4_pathway_search.py` - Mode 4: Pathway search
999+
- `examples/connectomics/example_orn_identifier_resolution.py` - Robust identifier resolution demo
9441000
- `examples/connectomics/analyze_data_characteristics.py` - Data quality analysis
9451001

9461002
### Advanced Examples
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
door_name,flywire_glomerulus,source_name,source_year,source_url_or_doi,evidence_note,confidence,is_ambiguous
2+
Or10a,ORN_DL1,"DoOR.mappings (DoOR.data v2.0.0; Münch & Galizia 2016)",2016,10.1038/srep21841,"Override: legacy toolkit CSV mapped Or10a to VC3l; DoOR.mappings assigns glomerulus DL1 (ab1D).",high,No
3+
Ir64a.DC4,ORN_DC4,"DoOR.mappings (DoOR.data v2.0.0; Münch & Galizia 2016)",2016,10.1038/srep21841,"Override: enforce dotted-suffix consistency (Ir64a.DC4 → DC4) and FlyWire ORN_ prefix.",high,No
4+
Ir64a.DP1m,ORN_DP1m,"DoOR.mappings (DoOR.data v2.0.0; Münch & Galizia 2016)",2016,10.1038/srep21841,"Override: enforce dotted-suffix consistency (Ir64a.DP1m → DP1m) and FlyWire ORN_ prefix.",high,No
5+
Or22b,ORN_DM2,"DoOR 2.0 receptor→glomerulus mapping (Münch & Galizia 2016)",2016,10.1038/srep21841,"Mapped at glomerulus level (DM2). Note: FlyWire ORN_ labels are glomerulus-level; Or22a/Or22b gene-level distinction is not captured in ORN_DM2. Optional strict single-cell mode can exclude Or22b as sensitivity analysis.",medium,No

0 commit comments

Comments
 (0)