Skip to content

NB02 GBM(eng)−GBM(flat) lift goes non-positive on scikit-learn 1.9.0 #114

@shaypal5

Description

@shaypal5

Summary

Execute release notebooks (G13.1) CI gate has been failing since scikit-learn 1.9.0 shipped (first observed 2026-06-10). The assertion headline_lift_auc > 0.0 in notebook 02 (02_relational_feature_engineering.ipynb) fails because HistGradientBoostingClassifier behavior changed between 1.7.x (local dev) and 1.9.0 (CI via uncapped >=1.3).

Observed values (sklearn 1.9.0, seed 42, intermediate bundle)

gbm_flat_auc:      observed=0.6339  target=0.6023  |diff|=0.0316 > tol=0.0200
headline_lift_auc: observed=-0.0253 target=0.0110  |diff|=0.0363 > tol=0.0150

The flat GBM improved substantially, but the engineered-feature GBM did not keep pace — causing the GBM(eng)−GBM(flat) lift to go from +0.0110 to -0.0253.

Immediate fix (PR #115)

Pinned scikit-learn<1.9 in [dev], [scripts], and [notebooks] extras of pyproject.toml. This unblocks CI while the underlying issue is investigated.

Root cause / next steps to resolve the pin

  1. Identify the exact HistGradientBoostingClassifier default that changed in 1.9 and whether recalibrating targets is the right approach, or whether the engineered features need to be revised to provide positive lift on 1.9.
  2. Once targets are updated or features revised, remove the <1.9 upper bound and rerun.

The notebook comment at line 27 (+0.0147 headline lift) will also need updating when the issue is resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtype: ciCI/CD pipeline changes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions