Skip to content

Commit 92b4d97

Browse files
authored
Merge pull request #238 from igerber/survey-last-phase
Add survey Phase 6: replicate weights, DEFF diagnostics, subpopulation analysis
2 parents c693675 + b26eeaa commit 92b4d97

36 files changed

Lines changed: 3611 additions & 386 deletions

TODO.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,15 +52,16 @@ Deferred items from PR reviews that were not addressed before merge.
5252
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
5353
| EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
5454
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
55+
| Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved |
5556
| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
5657
| CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
5758
| SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` || Resolved |
58-
| EfficientDiD hausman_pretest() clustered covariance uses stale `n_cl` after filtering non-finite EIF rows — should recompute effective cluster count and remap indices after `row_finite` filtering | `efficient_did.py` | #230 | Medium |
59+
| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved |
5960
| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
6061
| TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
61-
| ContinuousDiD event-study aggregation does not filter by `anticipation` — uses all (g,t) cells instead of anticipation-filtered subset; pre-existing in both survey and non-survey paths | `continuous_did.py` | #226 | Medium |
62+
| ContinuousDiD event-study aggregation anticipation filter **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved |
6263
| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
63-
| Duplicated survey metadata summary formatting across 6 results classes — extract shared `_format_survey_metadata(sm, width)` helper to reduce maintenance burden as more estimators gain survey support in Phases 4-5 | `results.py`, `stacked_did_results.py`, `sun_abraham.py`, `bacon.py`, `triple_diff.py`, `continuous_did_results.py`, `efficient_did_results.py` | #226 | Low |
64+
| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | | Resolved |
6465
| TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` || Low |
6566

6667
#### Performance
@@ -78,7 +79,7 @@ Deferred items from PR reviews that were not addressed before merge.
7879
| CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
7980
| ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` || Low |
8081
| Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium |
81-
| Survey bootstrap test coverage: add FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests | `tests/test_survey_phase*.py` | #237 | Medium |
82+
| Survey bootstrap test coverage**Resolved**. Added FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests. | `tests/test_survey_phase*.py` | | Resolved |
8283

8384
---
8485

diff_diff/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,10 @@
9595
SyntheticDiDResults,
9696
)
9797
from diff_diff.survey import (
98+
DEFFDiagnostics,
9899
SurveyDesign,
99100
SurveyMetadata,
101+
compute_deff_diagnostics,
100102
)
101103
from diff_diff.staggered import (
102104
CallawaySantAnna,
@@ -327,6 +329,8 @@
327329
# Survey support
328330
"SurveyDesign",
329331
"SurveyMetadata",
332+
"DEFFDiagnostics",
333+
"compute_deff_diagnostics",
330334
# Rust backend
331335
"HAS_RUST_BACKEND",
332336
# Linear algebra helpers

diff_diff/bacon.py

Lines changed: 32 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import numpy as np
1818
import pandas as pd
1919

20+
from diff_diff.results import _format_survey_block
2021
from diff_diff.utils import within_transform as _within_transform_util
2122

2223

@@ -144,23 +145,7 @@ def summary(self) -> str:
144145
# Add survey design info
145146
if self.survey_metadata is not None:
146147
sm = self.survey_metadata
147-
lines.extend(
148-
[
149-
"-" * 85,
150-
"Survey Design".center(85),
151-
"-" * 85,
152-
f"{'Weight type:':<35} {sm.weight_type:>10}",
153-
]
154-
)
155-
if sm.n_strata is not None:
156-
lines.append(f"{'Strata:':<35} {sm.n_strata:>10}")
157-
if sm.n_psu is not None:
158-
lines.append(f"{'PSU/Cluster:':<35} {sm.n_psu:>10}")
159-
lines.append(f"{'Effective sample size:':<35} {sm.effective_n:>10.1f}")
160-
lines.append(f"{'Design effect (DEFF):':<35} {sm.design_effect:>10.2f}")
161-
if sm.df_survey is not None:
162-
lines.append(f"{'Survey d.f.:':<35} {sm.df_survey:>10}")
163-
lines.extend(["-" * 85, ""])
148+
lines.extend(_format_survey_block(sm, 85))
164149

165150
lines.extend(
166151
[
@@ -477,6 +462,13 @@ def fit(
477462
resolved_survey, survey_weights, survey_weight_type, survey_metadata = (
478463
_resolve_survey_for_fit(survey_design, data, "analytical")
479464
)
465+
# Reject replicate-weight designs — Bacon decomposition is a
466+
# diagnostic that does not compute replicate-based variance
467+
if resolved_survey is not None and resolved_survey.uses_replicate_variance:
468+
raise NotImplementedError(
469+
"BaconDecomposition does not support replicate-weight survey "
470+
"designs. Use a TSL-based survey design (strata/psu/fpc)."
471+
)
480472

481473
# Validate within-unit constancy for exact survey weights only.
482474
# The exact-weight path collapses to per-unit weights via groupby().first(),
@@ -593,6 +585,13 @@ def fit(
593585
weights=survey_weights,
594586
)
595587

588+
if not comparisons:
589+
raise ValueError(
590+
"No valid 2x2 comparisons remain after filtering. "
591+
"All cells have zero effective weight or insufficient data. "
592+
"Check subpopulation/domain definition."
593+
)
594+
596595
# Normalize weights to sum to 1
597596
total_weight = sum(c.weight for c in comparisons)
598597
if total_weight > 0:
@@ -849,13 +848,21 @@ def _compute_treated_vs_never(
849848
never_post_mask = never_mask & df[time].isin(post_periods)
850849

851850
# Guard against empty cells (unbalanced/filtered panels)
851+
# Also check positive weight mass for survey/subpopulation designs
852852
if not (
853853
np.any(treated_pre_mask)
854854
and np.any(treated_post_mask)
855855
and np.any(never_pre_mask)
856856
and np.any(never_post_mask)
857857
):
858858
return None
859+
if (
860+
np.sum(w[treated_pre_mask]) <= 0
861+
or np.sum(w[treated_post_mask]) <= 0
862+
or np.sum(w[never_pre_mask]) <= 0
863+
or np.sum(w[never_post_mask]) <= 0
864+
):
865+
return None
859866

860867
treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
861868
treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])
@@ -966,14 +973,21 @@ def _compute_timing_comparison(
966973
control_pre_mask = control_mask & df[time].isin(pre_periods)
967974
control_post_mask = control_mask & df[time].isin(post_periods)
968975

969-
# Skip if any cell is empty
976+
# Skip if any cell is empty or has zero effective weight
970977
if (
971978
treated_pre_mask.sum() == 0
972979
or treated_post_mask.sum() == 0
973980
or control_pre_mask.sum() == 0
974981
or control_post_mask.sum() == 0
975982
):
976983
return None
984+
if (
985+
np.sum(w[treated_pre_mask]) <= 0
986+
or np.sum(w[treated_post_mask]) <= 0
987+
or np.sum(w[control_pre_mask]) <= 0
988+
or np.sum(w[control_post_mask]) <= 0
989+
):
990+
return None
977991

978992
treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
979993
treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])

0 commit comments

Comments
 (0)