Survey Phase 7: CS IPW/DR covariates, repeated cross-sections, HonestDiD survey variance#240
Survey Phase 7: CS IPW/DR covariates, repeated cross-sections, HonestDiD survey variance#240
Conversation
…DiD survey variance Phase 7a: Remove NotImplementedError gate for IPW/DR + covariates + survey. Add DRDID panel nuisance IF corrections (PS + OR) for both survey and non-survey DR paths. Extract _safe_inv helper for matrix inversions. Phase 7d: Thread survey df through HonestDiD for t-distribution critical values. Compute full event-study VCV from influence function vectors. Add event_study_vcov to CallawaySantAnnaResults. Phase 7b: Add panel=False for repeated cross-section support in CallawaySantAnna. New _precompute_structures_rc, _compute_att_gt_rc, and three RC estimation methods (reg, ipw, dr) with covariates and survey weights. Canonical index abstraction in aggregation/bootstrap. RCS data generator in generate_staggered_data(panel=False). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR ReviewOverall assessment
Executive summary
Methodology Cross-check basis:
Code Quality No additional findings beyond the methodology issues above. Performance No findings. Maintainability
Tech Debt No separate findings. The new Security No findings. Documentation/Tests
Path to Approval
Static review only; I was not able to do runtime spot-checks in this environment because the Python runtime here is missing required dependencies such as |
… VCV, panel on results Fix 5 findings from PR #240 CI review: - Add cross-sectional nuisance IF corrections (PS + OR) to _ipw_estimation_rc and _doubly_robust_rc, matching panel path methodology - Use fixed full-sample cohort masses for unweighted RCS aggregation weights (consistency with WIF group-share denominator) - Guard replicate-weight designs from full event-study VCV (diagonal fallback) - Add panel field to CallawaySantAnnaResults, fix summary labels for RCS - Add panel to class docstring, replicate VCV test, RCS IF correction test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR ReviewOverall assessment
Executive Summary
Methodology Affected methods: Callaway-Sant'Anna repeated cross sections (
Code Quality No findings beyond the methodology issues above. Performance No findings. Maintainability No findings. Tech Debt No separate findings. Neither remaining P1 issue is tracked in TODO.md:L48-L60, so Security No findings. Documentation/Tests
Path to Approval
Static review only: I could not execute the test suite here because this environment lacks |
…equal-count tests - Use separate normalizer_t/normalizer_s in _doubly_robust_rc() IF corrections (post vs base period treated denominators must match the DR estimator) - Precompute rcs_cohort_masses in _precompute_structures_rc() and return cohort mass as n_treated from _compute_att_gt_rc() — fixes all downstream consumers (aggregation, bootstrap, balance_e) at the source instead of per-consumer patches - Remove now-unnecessary rcs_cohort_counts blocks from aggregation - Add unequal cohort count test fixture and regression tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR ReviewOverall assessment Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ulas
_outcome_regression_rc: Pool all treated obs for OR correction term
(was: separate per-period averages). Period-specific treated means for Y.
Matches Sant'Anna & Zhao (2020) Eq 2.2 / R reg_did_rc exactly.
_doubly_robust_rc: Fit 4 OLS models (control+treated, pre+post) for
locally efficient DR estimator (was: 2 control-only). Implements tau_1
(AIPW) + tau_2 (local efficiency adjustment) with full 11-component IF.
Matches Sant'Anna & Zhao (2020) Eq 3.3+3.4 / R drdid_rc exactly.
Add agg_weight field to group_time_effects for RCS aggregation weight
(cohort mass), separate from n_treated (per-cell display count).
Aggregation uses data.get("agg_weight", data["n_treated"]) for
backward compatibility with panel data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability
Tech Debt No separate findings. The P1 bootstrap-weight issue above is not currently tracked in TODO.md:L48-L60 or TODO.md:L167-L170, so it remains unmitigated. Security No findings. Documentation/Tests
Path to Approval
|
- Bootstrap overall/event-study reaggregation now uses agg_weight (fixed cohort mass) for panel=False, matching the analytical aggregation path - Reset self._event_study_vcov = None at start of fit() to prevent stale VCV from prior fit leaking into reused estimator objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: I could not execute the changed tests in this environment because the default Python interpreter here does not have the project dependencies available. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…dy SEs Prevents HonestDiD from mixing analytical IF-based VCV with bootstrap SEs on bootstrap-fit CallawaySantAnna results. When n_bootstrap>0, the event_study_vcov is set to None so HonestDiD falls back to diagonal from the bootstrap SEs (consistent variance path). Add regression test: bootstrap CS → HonestDiD asserts vcov is None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: I could not execute the added tests here because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…s sum not mean - _outcome_regression_rc: M1 denominator changed from sum_w_D to n_all (matching R colMeans convention); inf_cont_2 / sum_w_D then gives correct single normalization by mean_w_D * n_all = sum_w_D - _ipw_estimation_rc: PS M2 uses np.sum/n_all instead of np.mean (which divided by n_ct/n_cs instead of n_all, under-scaling the correction) - _doubly_robust_rc: PS M2 already correct (np.sum/n_all), no change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review plus source cross-check only. I could not run Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Summary
NotImplementedErrorgate for CallawaySantAnna IPW/DR + covariates + survey. Implement DRDID panel nuisance IF corrections (propensity score + outcome regression) for both survey-weighted and non-survey DR paths (Sant'Anna & Zhao 2020, Theorem 3.1). Extract_safe_inv()helper for matrix inversions.event_study_vcovfield toCallawaySantAnnaResultsandsurvey_metadata/df_surveytoHonestDiDResults.panel=Falsefor repeated cross-section support in CallawaySantAnna. New_precompute_structures_rc(),_compute_att_gt_rc(), and three RC estimation methods (_outcome_regression_rc,_ipw_estimation_rc,_doubly_robust_rc) with covariates and survey weights. Canonical index abstraction in aggregation/bootstrap mixins. RCS data generator viagenerate_staggered_data(panel=False).Methodology references
Validation
tests/test_survey_phase7a.py(22 tests): smoke, scale invariance, uniform-weight equivalence, IF correction, aggregation, bootstrap, edge casestests/test_staggered_rc.py(23 tests): all methods, covariates, survey, aggregation, bootstrap, control groups, base periods, data generator, edge casestests/test_honest_did.py(+4 tests): survey df extraction, VCV computation, bounds widening, no-survey baselinetests/test_survey_phase4.py: 2 negative tests converted to positive assertionsSecurity / privacy
Generated with Claude Code