From 64006c89e0451112b21a59a30b604e62199ccd6c Mon Sep 17 00:00:00 2001 From: igerber Date: Sun, 29 Mar 2026 09:32:31 -0400 Subject: [PATCH] Clean up TODO.md: remove resolved items, update line counts - Remove 7 resolved tech debt entries that were cluttering the table - Update all module line counts (power.py now 2588, staggered.py 2275) - Remove stale EfficientDiD docs TODO (tutorial + API docs already exist) - Add Status column to feature gaps table linking to Survey Phase 7a/7b - Promote Plotly kwargs issue to top of Testing/Docs (only Medium item) - Flag power.py for potential splitting (largest file at 2588 lines) Co-Authored-By: Claude Opus 4.6 (1M context) --- TODO.md | 69 ++++++++++++++++++++++++++------------------------------- 1 file changed, 32 insertions(+), 37 deletions(-) diff --git a/TODO.md b/TODO.md index ba67261e..264e76ae 100644 --- a/TODO.md +++ b/TODO.md @@ -19,25 +19,28 @@ Current limitations that may affect users: ### Large Module Files -Target: < 1000 lines per module for maintainability. +Target: < 1000 lines per module for maintainability. Updated 2026-03-29. | File | Lines | Action | |------|-------|--------| -| `trop.py` | 862 | Split done — trop_global.py (990), trop_local.py (1000) | -| `utils.py` | 1838 | Monitor | -| `staggered.py` | 1785 | Monitor | -| `imputation.py` | 1756 | Monitor | -| `visualization.py` | 1727 | Monitor — growing but cohesive | -| `linalg.py` | 1727 | Monitor — unified backend, splitting would hurt cohesion | -| `triple_diff.py` | 1581 | Monitor | +| `power.py` | 2588 | Consider splitting (power analysis + MDE + sample size) | +| `linalg.py` | 2289 | Monitor — unified backend, splitting would hurt cohesion | +| `staggered.py` | 2275 | Monitor — grew with survey support | +| `imputation.py` | 2009 | Monitor | +| `triple_diff.py` | 1921 | Monitor | +| `utils.py` | 1902 | Monitor | +| `two_stage.py` | 1708 | Monitor | +| `survey.py` | 1646 | Monitor — grew with Phase 6 features | +| `continuous_did.py` | 1626 | Monitor | | `honest_did.py` | 1511 | Acceptable | -| `two_stage.py` | 1451 | Acceptable | -| `power.py` | 1350 | Acceptable | -| `prep.py` | 1242 | Acceptable | -| `sun_abraham.py` | 1162 | Acceptable | -| `continuous_did.py` | 1155 | Acceptable | -| `estimators.py` | 1147 | Acceptable | -| `pretrends.py` | 1104 | Acceptable | +| `sun_abraham.py` | 1540 | Acceptable | +| `estimators.py` | 1357 | Acceptable | +| `trop_local.py` | 1261 | Acceptable | +| `trop_global.py` | 1251 | Acceptable | +| `prep.py` | 1225 | Acceptable | +| `pretrends.py` | 1105 | Acceptable | +| `trop.py` | 981 | Split done — trop_global.py + trop_local.py | +| `visualization/` | 4172 | Subpackage (split across 7 files) — OK | --- @@ -49,20 +52,13 @@ Deferred items from PR reviews that were not addressed before merge. | Issue | Location | PR | Priority | |-------|----------|----|----------| -| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) | -| EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium | +| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) | | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium | -| Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved | -| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved | -| CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium | -| SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved | -| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved | -| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low | -| TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low | -| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved | -| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low | -| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved | -| TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low | +| CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works. | `staggered.py` | #233 | Medium — tracked in Survey Phase 7a | +| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. Needs design decision for `anticipation>0`. | `efficient_did.py` | #230 | Low | +| TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. | `prep_dgp.py`, `power.py` | #208 | Low | +| Survey design resolution/collapse patterns inconsistent across panel estimators — extract shared helpers for panel-to-unit collapse, post-filter re-resolution, metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low | +| TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low | #### Performance @@ -75,12 +71,11 @@ Deferred items from PR reviews that were not addressed before merge. | Issue | Location | PR | Priority | |-------|----------|----|----------| +| Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium | | R comparison tests spawn separate `Rscript` per test (slow CI) | `tests/test_methodology_twfe.py:294` | #139 | Low | | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low | -| ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` | — | Low | -| Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium | -| Survey bootstrap test coverage — **Resolved**. Added FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests. | `tests/test_survey_phase*.py` | — | Resolved | -| Doc-snippet smoke tests only cover `.rst` files; new `.txt` AI guides are outside CI validation | `tests/test_doc_snippets.py` | #239 | Low | +| ~376 `duplicate object description` Sphinx warnings — restructure `docs/api/*.rst` to avoid duplicate `:members:` + `autosummary` | `docs/api/*.rst` | — | Low | +| Doc-snippet smoke tests only cover `.rst` files; `.txt` AI guides outside CI validation | `tests/test_doc_snippets.py` | #239 | Low | --- @@ -164,11 +159,11 @@ Spurious RuntimeWarnings ("divide by zero", "overflow", "invalid value") are emi Features in R's `did` package that block porting additional tests: -| Feature | R tests blocked | Priority | -|---------|----------------|----------| -| Repeated cross-sections (`panel=FALSE`) | ~7 tests in test-att_gt.R + test-user_bug_fixes.R | Medium | -| Sampling/population weights | 7 tests incl. all JEL replication | Medium | -| Calendar time aggregation | 1 test in test-att_gt.R | Low | +| Feature | R tests blocked | Priority | Status | +|---------|----------------|----------|--------| +| Repeated cross-sections (`panel=FALSE`) | ~7 tests in test-att_gt.R + test-user_bug_fixes.R | High | Planned — Survey Phase 7b | +| Sampling/population weights | 7 tests incl. all JEL replication | Medium | Mostly resolved (Phases 1-6); CS IPW/DR + covariates + survey in Phase 7a | +| Calendar time aggregation | 1 test in test-att_gt.R | Low | | ---