Fix CS aggregate='group' to use design-based survey SE path

igerber · claude · igerber · commit 4c63569d0f8a · 2026-03-26T06:30:45.000-04:00
P1: _aggregate_by_group now calls _compute_aggregated_se_with_wif (matching overall/event_study paths) for design-based variance P3: Narrow R subbootstrap match claim to no-FPC case P3: Update TODO.md CS resolved entry with PR #237 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
diff --git a/TODO.md b/TODO.md
@@ -52,7 +52,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
 | EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
-| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs now use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #233 | Resolved |
+| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
 | CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
 | SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
 | EfficientDiD hausman_pretest() clustered covariance uses stale `n_cl` after filtering non-finite EIF rows — should recompute effective cluster count and remap indices after `row_finite` filtering | `efficient_did.py` | #230 | Medium |
diff --git a/diff_diff/staggered.py b/diff_diff/staggered.py
@@ -1494,6 +1494,8 @@ def fit(
                 influence_func_info,
                 treatment_groups,
                 precomputed=precomputed,
+                df=df,
+                unit=unit,
             )
 
         # Run bootstrap inference if requested
diff --git a/diff_diff/staggered_aggregation.py b/diff_diff/staggered_aggregation.py
@@ -645,17 +645,19 @@ def _aggregate_by_group(
         influence_func_info: Dict,
         groups: List[Any],
         precomputed: Optional["PrecomputedData"] = None,
+        df: Optional[pd.DataFrame] = None,
+        unit: Optional[str] = None,
     ) -> Dict[Any, Dict[str, Any]]:
         """
         Aggregate effects by treatment cohort.
 
         Computes average effect for each cohort across all post-treatment periods.
 
-        Standard errors use influence function aggregation to account for
-        covariances across time periods within a cohort.
+        Standard errors use influence function aggregation with WIF adjustment
+        to account for covariances across time periods within a cohort.
+        When a full survey design is present in precomputed, uses design-based
+        variance via compute_survey_if_variance().
         """
-        n_units = len(precomputed["all_units"]) if precomputed is not None else None
-
         # Collect all group aggregation data first
         group_data_list = []
         for g in groups:
@@ -682,8 +684,11 @@ def _aggregate_by_group(
             weights = np.ones(len(effs)) / len(effs)
             agg_effect = np.sum(weights * effs)
 
-            agg_se = self._compute_aggregated_se(
-                gt_pairs, weights, influence_func_info, n_units=n_units
+            # Use WIF-adjusted SE (with survey design support)
+            groups_for_gt = np.array([gg for (gg, t) in gt_pairs])
+            agg_se = self._compute_aggregated_se_with_wif(
+                gt_pairs, weights, effs, groups_for_gt,
+                influence_func_info, df, unit, precomputed
             )
             group_data_list.append((g, agg_effect, agg_se, len(g_effects)))
 
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -1965,8 +1965,9 @@ ContinuousDiD, EfficientDiD):
 - **Note:** Bootstrap paths support `lonely_psu="remove"` and `"certainty"` only.
   `lonely_psu="adjust"` raises `NotImplementedError` for survey-aware bootstrap;
   use analytical inference for designs requiring `adjust` semantics.
-- **Deviation from R:** R `survey::as.svrepdesign(type="subbootstrap")` uses the same
-  formula. Our implementation matches.
+- **Deviation from R:** For the no-FPC case (`m_h = n_h - 1`), this matches R
+  `survey::as.svrepdesign(type="subbootstrap")`. The FPC-adjusted resample size
+  `m_h = round((1-f_h)*(n_h-1))` follows Rao, Wu & Yue (1992) Section 3.
 
 **CallawaySantAnna Design-Based Aggregated SEs**:
 

Original file line number	Diff line number	Diff line change
`@@ -1494,6 +1494,8 @@ def fit(`
`1494`	`1494`	`influence_func_info,`
`1495`	`1495`	`treatment_groups,`
`1496`	`1496`	`precomputed=precomputed,`
	`1497`	`+ df=df,`
	`1498`	`+ unit=unit,`
`1497`	`1499`	`)`
`1498`	`1500`
`1499`	`1501`	`# Run bootstrap inference if requested`