Skip to content

add MiloDE#1001

Merged
Zethson merged 11 commits into
mainfrom
feat/milode-de-nhoods
May 28, 2026
Merged

add MiloDE#1001
Zethson merged 11 commits into
mainfrom
feat/milode-de-nhoods

Conversation

@Zethson
Copy link
Copy Markdown
Member

@Zethson Zethson commented May 27, 2026

Closes #635.

Implements miloDE-style per-neighbourhood differential expression testing as Milo.de_nhoods. Pure Python, no edgeR.

API

import pertpy as pt
import scanpy as sc

adata = pt.dt.kang_2018()                       # 8 patients × ctrl/stim, raw counts in .X
adata.layers["counts"] = adata.X.copy()
adata.obs["sample"] = adata.obs["replicate"].astype(str) + "_" + adata.obs["label"].astype(str)
sc.pp.normalize_total(adata); sc.pp.log1p(adata); sc.pp.pca(adata)
sc.pp.neighbors(adata, use_rep="X_pca", n_neighbors=100)
sc.tl.umap(adata)

milo = pt.tl.Milo()
mdata = milo.load(adata)
milo.make_nhoods(mdata["rna"], prop=0.1)
mdata = milo.count_nhoods(mdata, sample_col="sample")
milo.build_nhood_graph(mdata)

# Per-nhood DE -- same column names as the rest of pertpy DE, plus the
# spatial-FDR axis miloDE introduces.
de = milo.de_nhoods(
    mdata,
    design="~label",
    column="label", baseline="ctrl", group_to_compare="stim",
    solver="pydeseq2",          # or "statsmodels" (auto NB-GLM + size-factor offset)
    layer="counts",
)
# Long DataFrame: nhood, variable, log_fc, p_value, adj_p_value,
# pval_corrected_across_nhoods, test_performed.

# Visualize one gene's per-nhood logFC -- one line:
milo.plot_de_nhood_graph(mdata, de, gene="ISG15", min_size=2)

# Per-nhood volcano with the existing DE plot (column names already match):
nh_df = de.loc[de["nhood"] == mdata["milo"].var_names[0]].dropna(subset=["adj_p_value"])
pt.tl.TTest(mdata["milo"]).plot_volcano(nh_df, log2fc_thresh=0.5)

milo.plot_de_nhood_graph(mdata, de, gene="ISG15") on kang_2018 (canonical IFN-β response gene; expected strong positive logFC in many nhoods):

nhood graph

What it does

For each nhood: pseudobulks cells by sample via sc.get.aggregate, fits the existing pertpy DE model (PyDESeq2 or Statsmodels), and applies the two-axis correction miloDE uses — BH across genes within each nhood, density-weighted BH across nhoods per gene (same correction da_nhoods uses, now factored into a shared _weighted_bh helper). Nhoods that fail validity (too few samples per condition, rank-deficient design) are skipped and flagged in test_performed; one summary log line is emitted at the end rather than one per skipped nhood.

plot_de_nhood_graph is the DE-side counterpart of plot_nhood_graph: takes the long DataFrame plus a gene name and renders the nhood graph colored by that gene's logFC, masking nhoods with pval_corrected_across_nhoods > alpha. The shared rendering body is factored into a private helper used by both.

Validation against miloDE R

Ran the R package on the same synthetic data + nhood structure (71 nhoods × 200 genes, planted DE in one population):

metric value
nhoods passing test_performed (R / Py / both) 71 / 71 / 71
Spearman(logFC) across genes per nhood median 0.996 (min 0.978)
Spearman(pvalue) across genes per nhood median 0.934
sign-concordance on R-significant entries (padj<0.1) 282 / 282 = 1.000
Spearman(pval_corrected_across_nhoods) per planted gene median 0.919

Remaining differences are NB-GLM Wald (PyDESeq2) vs edgeR's QLF on small samples.

Zethson and others added 2 commits May 27, 2026 17:05
For each nhood, pseudobulks cells by sample (sc.get.aggregate), fits the
existing PyDESeq2 or Statsmodels DE model, and applies the two-axis
correction miloDE uses: BH across genes within each nhood and
density-weighted BH across nhoods per gene. The weighted-BH used by
da_nhoods is factored out into a shared _weighted_bh helper.

Validated against miloDE R on synthetic data: median Spearman(logFC) =
0.996 across genes per nhood, 100% sign-concordance on R-significant
entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Zethson Zethson changed the title milo: add de_nhoods for per-neighbourhood DE (miloDE, pure Python) add MiloDE May 27, 2026
Zethson and others added 3 commits May 27, 2026 17:22
Switch return type from AnnData(n_nhoods × n_genes) to a long DataFrame
with columns (nhood, variable, log_fc, p_value, adj_p_value,
pval_corrected_across_nhoods, test_performed) — same shape pertpy's
other DE methods produce and the same shape miloDE R returns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pairs with de_nhoods the same way plot_nhood_graph pairs with da_nhoods:
takes the long DataFrame + a gene name and renders the nhood graph
colored by that gene's logFC, masking nhoods above the spatial-FDR
threshold. The rendering body shared with plot_nhood_graph is factored
into a private _render_nhood_graph helper to avoid duplication.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Zethson and others added 6 commits May 27, 2026 17:47
Pulls in scverse/pertpy-tutorials#61 which appends a Milo.de_nhoods +
plot_de_nhood_graph demonstration to milo.ipynb.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PyDESeq2 prints progress per fit; calling it once per nhood otherwise
floods stdout. Wrap the per-nhood model fit/test in
contextlib.redirect_stdout so de_nhoods runs cleanly.

Re-points the tutorials submodule to scverse/pertpy-tutorials@cffc9b5
which adds the miloDE section to milo.ipynb, with warnings filtered at
the top of the notebook so per-nhood validity messages and small-sample
PyDESeq2 warnings don't drown out cell outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scverse/pertpy-tutorials@9ff43c2 switches the miloDE tutorial section
to kang_2018 (8 patients × ctrl/stim, raw counts in .X) so the example
recovers ISG15 / ISG20 as top hits rather than mostly-NaN results.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes plus a tutorials submodule bump.

de_nhoods previously stored p-values as float32, which underflows to
zero below ~1.4e-45. Strong DE signals (IFN-β ISGs in the kang_2018
example) routinely produce p ~ 1e-200 from PyDESeq2, which was getting
silently clamped to 0 in both the raw p_value and adj_p_value columns
(and hence pval_corrected_across_nhoods too). Storing p-values in
float64 fixes it; logFC stays float32.

Also collapses the per-nhood `Nhood X: DE test failed (...)` logger
warnings (one per skipped nhood) into a single end-of-run summary, so
the tutorial doesn't have to reach into pertpy._logger to silence
them.

Tutorials submodule bumps to scverse/pertpy-tutorials@a0b9e84, which
drops the pertpy._logger import from milo.ipynb and re-executes against
this fix (top hits now show p_value ~ 1e-73, not 0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The miloDE tutorial section landed via scverse/pertpy-tutorials#61
(squashed to d2ab4bf on main); move the pinned commit from the
feature-branch tip to the merged main commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches the tutorial example. The synthetic-data version was less
informative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Zethson Zethson merged commit 8a5c178 into main May 28, 2026
18 of 19 checks passed
@Zethson Zethson deleted the feat/milode-de-nhoods branch May 28, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement MiloDE (diff. expr. testing on top of milo framework)

1 participant