Address P3 findings for clean land

igerber · claude · igerber · commit 6218e45e3a7a · 2026-03-26T06:47:56.000-04:00
- Fix CS registry: clarify analytical vs bootstrap SE paths
- Update ImputationDiD/TwoStageDiD docstrings: survey bootstrap supported
- Add CS full-design aggregate='group' regression test (250 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/imputation.py b/diff_diff/imputation.py
@@ -208,7 +208,7 @@ def fit(
             pweight only (aweight/fweight raise ValueError). FPC raises
             NotImplementedError. PSU is used as cluster variable for Theorem 3
             variance. Strata enters survey df for t-distribution inference.
-            Requires analytical inference (n_bootstrap=0).
+            Both analytical (n_bootstrap=0) and bootstrap inference are supported.
 
         Returns
         -------
@@ -1973,7 +1973,7 @@ def imputation_did(
         pweight only (aweight/fweight raise ValueError). FPC raises
         NotImplementedError. PSU is used as cluster variable for Theorem 3
         variance. Strata enters survey df for t-distribution inference.
-        Requires analytical inference (n_bootstrap=0).
+        Both analytical (n_bootstrap=0) and bootstrap inference are supported.
     **kwargs
         Additional keyword arguments passed to ImputationDiD constructor.
 
diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py
@@ -204,7 +204,7 @@ def fit(
             pweight only (aweight/fweight raise ValueError). FPC raises
             NotImplementedError. PSU is used as cluster variable for Theorem 3
             variance. Strata enters survey df for t-distribution inference.
-            Requires analytical inference (n_bootstrap=0).
+            Both analytical (n_bootstrap=0) and bootstrap inference are supported.
 
         Returns
         -------
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -418,7 +418,7 @@ The multiplier bootstrap uses random weights w_i with E[w]=0 and Var(w)=1:
     not-yet-treated cohorts serve as controls for each other (requires ≥2 cohorts)
 - **Note:** CallawaySantAnna survey support: weights, strata, PSU, and FPC are all supported. Analytical (`n_bootstrap=0`): aggregated SEs use design-based variance via `compute_survey_if_variance()`. Bootstrap (`n_bootstrap>0`): PSU-level multiplier weights replace analytical SEs for aggregated quantities. Regression method supports covariates; IPW/DR support no-covariate only (covariates+IPW/DR raises NotImplementedError — DRDID nuisance IF not yet implemented). Survey weights compose with IPW weights multiplicatively. WIF in aggregation matches R's did::wif() formula. Per-unit survey weights are extracted via `groupby(unit).first()` from the panel-normalized pweight array; on unbalanced panels the pweight normalization (`w * n_obs / sum(w)`) preserves relative unit weights since all IF/WIF formulas use weight ratios (`sw_i / sum(sw)`) where the normalization constant cancels. Scale-invariance tests pass on both balanced and unbalanced panels.
 - **Note (deviation from R):** CallawaySantAnna survey reg+covariates per-cell SE uses a conservative plug-in IF based on WLS residuals. The treated IF is `inf_treated_i = (sw_i/sum(sw_treated)) * (resid_i - ATT)` (normalized by treated weight sum, matching unweighted `(resid-ATT)/n_t`). The control IF is `inf_control_i = -(sw_i/sum(sw_control)) * wls_resid_i` (normalized by control weight sum, matching unweighted `-resid/n_c`). SE is computed as `sqrt(sum(sw_t_norm * (resid_t - ATT)^2) + sum(sw_c_norm * resid_c^2))`, the weighted analogue of the unweighted `sqrt(var_t/n_t + var_c/n_c)`. This omits the semiparametrically efficient nuisance correction from DRDID's `reg_did_panel` — WLS residuals are orthogonal to the weighted design matrix by construction, so the first-order IF term is asymptotically valid but may be conservative. SEs pass weight-scale-invariance tests. The efficient DRDID correction is deferred to future work.
-- **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. When strata/PSU/FPC are present, aggregated SEs are computed via PSU-level multiplier bootstrap (see bootstrap + survey note above) rather than analytical Taylor-series linearization on the combined IF/WIF.
+- **Note (deviation from R):** Per-cell ATT(g,t) SEs under survey weights use influence-function-based variance (matching R's `did::att_gt` analytical SE path) rather than full Taylor-series linearization. When strata/PSU/FPC are present, analytical aggregated SEs (`n_bootstrap=0`) use `compute_survey_if_variance()` on the combined IF/WIF; bootstrap aggregated SEs (`n_bootstrap>0`) use PSU-level multiplier weights.
 
 **Reference implementation(s):**
 - R: `did::att_gt()` (Callaway & Sant'Anna's official package)
diff --git a/tests/test_survey_phase4.py b/tests/test_survey_phase4.py
@@ -880,6 +880,24 @@ def test_aggregate_group_with_survey(self, staggered_survey_data, survey_design_
         assert result.group_effects is not None
         assert len(result.group_effects) > 0
 
+    def test_aggregate_group_full_design(self, staggered_survey_data):
+        """aggregate='group' with full design uses design-based SEs."""
+        sd_full = SurveyDesign(weights="weight", strata="stratum", psu="psu")
+        result = CallawaySantAnna(estimation_method="reg").fit(
+            staggered_survey_data,
+            "outcome",
+            "unit",
+            "period",
+            "first_treat",
+            aggregate="group",
+            survey_design=sd_full,
+        )
+        assert result.group_effects is not None
+        assert len(result.group_effects) > 0
+        for g, info in result.group_effects.items():
+            assert np.isfinite(info["effect"])
+            assert np.isfinite(info["se"])
+
     def test_aggregate_all_with_survey(self, staggered_survey_data, survey_design_weights_only):
         """aggregate='all' works with weights-only survey design."""
         result = CallawaySantAnna(estimation_method="reg").fit(