igerber
diff --git a/‎CHANGELOG.md‎
Lines changed: 29 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions b/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 8 additions & 0 deletions b/‎README.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 22 additions & 20 deletions b/‎TODO.md‎
Lines changed: 22 additions & 20 deletions
diff --git a/‎diff_diff/__init__.py‎
Lines changed: 11 additions & 1 deletion b/‎diff_diff/__init__.py‎
Lines changed: 11 additions & 1 deletion
@@ -7,6 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [2.7.6] - 2026-03-28
+
+### Added
+- **AI practitioner guardrails** based on Baker et al. (2025) "Difference-in-Differences Designs: A Practitioner's Guide"
+  - `practitioner.py` module with 8-step workflow enforcement for AI agents
+  - Estimator-specific handlers ensuring correct diagnostic ordering (pre-trends before estimation, Bacon decomposition before estimator selection)
+  - `docs/llms.txt`, `docs/llms-practitioner.txt`, `docs/llms-full.txt` for AI agent discoverability
+  - Evaluation rubric (`docs/practitioner-guide-evaluation.md`) with correctness-aware scoring
+- **Survey Phase 6: Advanced features**
+  - Survey-aware bootstrap for all 8 bootstrap-using estimators (PSU-level multiplier for CS/Imputation/TwoStage/Continuous/Efficient; Rao-Wu rescaled for SA/SyntheticDiD/TROP)
+  - Replicate weight variance estimation (BRR, Fay's BRR, JK1, JKn) for OLS-based and IF-based estimators
+  - Per-coefficient DEFF diagnostics comparing survey vs SRS variance
+  - Subpopulation analysis via `SurveyDesign.subpopulation()` preserving full design structure
+  - CS analytical expansion: strata/PSU/FPC for aggregated SEs via `compute_survey_if_variance()`
+  - TROP cross-classified pseudo-strata for survey-aware bootstrap
+
+### Changed
+- Estimator-specific guidance for parallel trends tests and placebo checks (no shared templates)
+- SDiD and TROP split into separate decision tree branches in practitioner workflow
+
+### Fixed
+- Fix replicate weight df calculation using pivoted QR rank with R-compatible tolerance
+- Fix replicate IF variance score scaling for EfficientDiD, TripleDiff, ContinuousDiD
+- Fix panel-to-unit replicate weight propagation and normalization
+- Fix CS zero-mass return type and vectorized guard for survey paths
+- Fix `solve_logit` effective-sample validation for zero-weight designs
+- Fix subpopulation mask validation and EfficientDiD bootstrap guard
+
 ## [2.7.5] - 2026-03-23
 
 ### Added
@@ -978,6 +1006,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - `to_dict()` and `to_dataframe()` export methods
   - `is_significant` and `significance_stars` properties
 
+[2.7.6]: https://github.com/igerber/diff-diff/compare/v2.7.5...v2.7.6
 [2.7.5]: https://github.com/igerber/diff-diff/compare/v2.7.4...v2.7.5
 [2.7.4]: https://github.com/igerber/diff-diff/compare/v2.7.3...v2.7.4
 [2.7.3]: https://github.com/igerber/diff-diff/compare/v2.7.2...v2.7.3
 
@@ -138,6 +138,7 @@ category (`Methodology/Correctness`, `Performance`, or `Testing/Docs`):
 | `CONTRIBUTING.md` | Documentation requirements, test writing guidelines |
 | `.claude/commands/dev-checklists.md` | Checklists for params, methodology, warnings, reviews, bugs (run `/dev-checklists`) |
 | `.claude/memory.md` | Debugging patterns, tolerances, API conventions (git-tracked) |
+| `docs/llms-practitioner.txt` | Baker et al. (2025) 8-step practitioner workflow for AI agents |
 | `docs/performance-plan.md` | Performance optimization details |
 | `docs/benchmarks.rst` | Validation results vs R |
 
 
@@ -67,6 +67,14 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 ======================================================================
 ```
 
+## For AI Agents
+
+If you are an AI agent or LLM using this library, read [`docs/llms.txt`](docs/llms.txt) for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
+
+After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
+
+Detailed guide: [`docs/llms-practitioner.txt`](docs/llms-practitioner.txt)
+
 ## Features
 
 - **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()`
 
@@ -19,25 +19,28 @@ Current limitations that may affect users:
 
 ### Large Module Files
 
-Target: < 1000 lines per module for maintainability.
+Target: < 1000 lines per module for maintainability. Updated 2026-03-29.
 
 | File | Lines | Action |
 |------|-------|--------|
-| `trop.py` | 862 | Split done — trop_global.py (990), trop_local.py (1000) |
-| `utils.py` | 1838 | Monitor |
-| `staggered.py` | 1785 | Monitor |
-| `imputation.py` | 1756 | Monitor |
-| `visualization.py` | 1727 | Monitor — growing but cohesive |
-| `linalg.py` | 1727 | Monitor — unified backend, splitting would hurt cohesion |
-| `triple_diff.py` | 1581 | Monitor |
+| `power.py` | 2588 | Consider splitting (power analysis + MDE + sample size) |
+| `linalg.py` | 2289 | Monitor — unified backend, splitting would hurt cohesion |
+| `staggered.py` | 2275 | Monitor — grew with survey support |
+| `imputation.py` | 2009 | Monitor |
+| `triple_diff.py` | 1921 | Monitor |
+| `utils.py` | 1902 | Monitor |
+| `two_stage.py` | 1708 | Monitor |
+| `survey.py` | 1646 | Monitor — grew with Phase 6 features |
+| `continuous_did.py` | 1626 | Monitor |
 | `honest_did.py` | 1511 | Acceptable |
-| `two_stage.py` | 1451 | Acceptable |
-| `power.py` | 1350 | Acceptable |
-| `prep.py` | 1242 | Acceptable |
-| `sun_abraham.py` | 1162 | Acceptable |
-| `continuous_did.py` | 1155 | Acceptable |
-| `estimators.py` | 1147 | Acceptable |
-| `pretrends.py` | 1104 | Acceptable |
+| `sun_abraham.py` | 1540 | Acceptable |
+| `estimators.py` | 1357 | Acceptable |
+| `trop_local.py` | 1261 | Acceptable |
+| `trop_global.py` | 1251 | Acceptable |
+| `prep.py` | 1225 | Acceptable |
+| `pretrends.py` | 1105 | Acceptable |
+| `trop.py` | 981 | Split done — trop_global.py + trop_local.py |
+| `visualization/` | 4172 | Subpackage (split across 7 files) — OK |
 
 ---
 
@@ -49,8 +52,7 @@ Deferred items from PR reviews that were not addressed before merge.
 
 | Issue | Location | PR | Priority |
 |-------|----------|----|----------|
-| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
-| EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
+| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 | Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved |
 | CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
@@ -75,11 +77,11 @@ Deferred items from PR reviews that were not addressed before merge.
 
 | Issue | Location | PR | Priority |
 |-------|----------|----|----------|
+| Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium |
 | R comparison tests spawn separate `Rscript` per test (slow CI) | `tests/test_methodology_twfe.py:294` | #139 | Low |
 | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
-| ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` | — | Low |
-| Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium |
-| Survey bootstrap test coverage — **Resolved**. Added FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests. | `tests/test_survey_phase*.py` | — | Resolved |
+| ~376 `duplicate object description` Sphinx warnings — restructure `docs/api/*.rst` to avoid duplicate `:members:` + `autosummary` | `docs/api/*.rst` | — | Low |
+| Doc-snippet smoke tests only cover `.rst` files; `.txt` AI guides outside CI validation | `tests/test_doc_snippets.py` | #239 | Low |
 
 ---
 
 
@@ -3,6 +3,13 @@
 
 This library provides sklearn-like estimators for causal inference
 using the difference-in-differences methodology.
+
+For rigorous analysis, follow the 8-step practitioner workflow in
+docs/llms-practitioner.txt (based on Baker et al. 2025). After
+estimation, call ``practitioner_next_steps(results)`` for context-aware
+guidance on remaining diagnostic steps.
+
+AI agent reference: docs/llms.txt
 """
 
 # Import backend detection from dedicated module (avoids circular imports)
@@ -168,6 +175,7 @@
     plot_staircase,
     plot_synth_weights,
 )
+from diff_diff.practitioner import practitioner_next_steps
 from diff_diff.datasets import (
     clear_cache,
     list_datasets,
@@ -193,7 +201,7 @@
 Bacon = BaconDecomposition
 EDiD = EfficientDiD
 
-__version__ = "2.7.5"
+__version__ = "2.7.6"
 __all__ = [
     # Estimators
     "DifferenceInDifferences",
@@ -344,4 +352,6 @@
     "load_dataset",
     "list_datasets",
     "clear_cache",
+    # Practitioner guidance
+    "practitioner_next_steps",
 ]