diff --git a/differential_gene_expression.ipynb b/differential_gene_expression.ipynb
index 5874f3d..2ee48fe 100644
--- a/differential_gene_expression.ipynb
+++ b/differential_gene_expression.ipynb
@@ -12,13 +12,19 @@
    "metadata": {},
    "source": [
     "Differential gene expression (DGE) analysis identifies genes that show statistically significant differences in expression levels across distinct cell populations or conditions.\n",
-    "This analysis helps in identifying which cell types are most affected by a condition of interest such as a disease, and characterizing their functional signatures. \n",
+    "This analysis helps in identifying which cell types are most affected by a condition of interest such as a disease, and characterizing their functional signatures.\n",
     "\n",
-    "Pertpy provides an API to access several types of models for differential expression analysis.\n",
-    "The first group of models comprises the T-test and Wilcoxon test as simple statistical tests for comparing expression values between two groups without covariates.\n",
-    "The second group includes models of the linear model family that allow modeling complex designs and contrasts. Currently included are [PyDESeq2](https://academic.oup.com/bioinformatics/article/39/9/btad547/7260507), [edgeR](https://academic.oup.com/bioinformatics/article/26/1/139/182458) as well as a wrapper around statsmodels [Statsmodels](https://www.statsmodels.org). which provides access to a wide range of regression models, including ordinary least squares regression, robust linear models and generalized linear models.\n",
+    "Pertpy provides a unified API to several families of DGE models so you can pick the one that fits your design:\n",
     "\n",
-    "In the following tutorial we will demonstrate how the edgeR interface can be used to model complex interactions using the triple-negative breast cancer (TNBC) [Zhang dataset](https://www.sciencedirect.com/science/article/pii/S1535610821004992)."
+    "- **Simple statistical tests** ({class}`~pertpy.tools.TTest`, {class}`~pertpy.tools.WilcoxonTest`) compare two groups directly on the expression matrix.\n",
+    "  They are fast, assumption-light, and a reasonable choice when you only have a single binary condition and no covariates to adjust for.\n",
+    "- **Pseudobulk + count-based GLMs** ({class}`~pertpy.tools.EdgeR`, {class}`~pertpy.tools.PyDESeq2`) aggregate cells into per-sample pseudobulks and fit a negative-binomial GLM.\n",
+    "  This is the recommended approach for multi-sample studies with covariates and the [current best practice](https://www.sc-best-practices.org/conditions/differential_gene_expression.html) for scRNA-seq DE because it controls false positives caused by treating individual cells as independent replicates.\n",
+    "  {class}`~pertpy.tools.EdgeR` calls into the R package via `rpy2`; {class}`~pertpy.tools.PyDESeq2` is a pure-Python reimplementation of DESeq2.\n",
+    "- **Generic linear models** ({class}`~pertpy.tools.Statsmodels`) wrap {mod}`statsmodels` and expose OLS, robust linear models, and GLMs.\n",
+    "  Use this when your response variable does not look like a count (for example, log-normalised expression or a continuous score).\n",
+    "\n",
+    "In the following tutorial we will demonstrate how the edgeR and PyDESeq2 interfaces can be used to model complex interactions using the triple-negative breast cancer (TNBC) [Zhang dataset](https://www.sciencedirect.com/science/article/pii/S1535610821004992).\n"
    ]
   },
   {
@@ -948,14 +954,21 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<!-- The interface for edgeR supports complex designs. -->\n",
-    "Here, we are fit a model to capture that the type of treatment (`Treatment`) and response to the treatment (`Efficacy`) contribute independently to the gene expression levels.\n",
+    "A linear model needs a **design matrix** that encodes, for each sample, the values of the covariates we want to control for or test against.\n",
+    "Pertpy lets you describe the design with a [Wilkinson formula](https://matthewwardrop.github.io/formulaic/latest/formulas/) — the same syntax used by R and patsy.\n",
+    "A few examples:\n",
+    "\n",
+    "- `~ Treatment` — intercept plus a coefficient per treatment level (the baseline level is absorbed into the intercept).\n",
+    "- `~ Treatment + Efficacy` — both covariates as **additive**, independent effects.\n",
+    "- `~ Treatment + Efficacy + Treatment:Efficacy` (equivalently `~ Treatment * Efficacy`) — additive effects plus an **interaction** term that asks whether the effect of one covariate depends on the level of the other.\n",
+    "\n",
+    "Here we start with the additive model: the type of treatment (`Treatment`) and response to the treatment (`Efficacy`) contribute independently to gene expression.\n",
     "By doing so, we can evaluate:\n",
     "\n",
     "1. How much of the variation in gene expression can be attributed to differences in drug efficacy, independent of the type of treatment.\n",
     "2. How much of the variation is due to the type of treatment, independent of the efficacy of the drug.\n",
     "\n",
-    "This setup helps in understanding not just whether a treatment works, but how its effectiveness might vary or be influenced by the inherent efficacy of the drug."
+    "This setup helps in understanding not just whether a treatment works, but how its effectiveness might vary or be influenced by the inherent efficacy of the drug.\n"
    ]
   },
   {
@@ -1012,7 +1025,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To now determine the differentially expressed genes between the treatments, we can specify a contrast as follows:"
+    "Fitting the model gave us one coefficient per term in the design.\n",
+    "To turn those coefficients into a biological comparison we evaluate a **contrast**: a vector with one entry per coefficient that says \"take this combination of fitted effects and test whether it differs from zero\".\n",
+    "For a simple two-group comparison along a single column, {meth}`~pertpy.tools.EdgeR.contrast` builds the right vector automatically.\n",
+    "We will assemble more complex contrasts by hand further down.\n"
    ]
   },
   {
@@ -1209,6 +1225,25 @@
     "res_df.head(10)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The result table has one row per gene.\n",
+    "The key columns to look at:\n",
+    "\n",
+    "- `variable` — gene symbol.\n",
+    "- `log_fc` — log2 fold change between the two contrasted groups.\n",
+    "  Positive values mean the gene is up in `group_to_compare` relative to `baseline`; negative values mean it is down.\n",
+    "- `p_value` — raw p-value from the test that `log_fc` differs from zero.\n",
+    "  Useful for diagnostics, but do not threshold on this directly.\n",
+    "- `adj_p_value` — p-value after Benjamini–Hochberg adjustment for multiple testing.\n",
+    "  This is the value to threshold on (commonly `< 0.05`).\n",
+    "- `contrast` — identifier of the contrast when multiple are tested at once; `None` here because we only ran one contrast.\n",
+    "\n",
+    "A standard summary is \"significantly differentially expressed\" = `adj_p_value < 0.05` **and** `|log_fc|` above some effect-size threshold (the volcano plot below uses `log2fc_thresh` for the latter).\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1241,6 +1276,16 @@
     "edgr.plot_volcano(res_df, log2fc_thresh=0)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once you have a ranked gene table, a typical next step is **gene-set / pathway enrichment** to turn per-gene statistics into per-pathway statistics.\n",
+    "[decoupler](https://decoupler.readthedocs.io/) integrates well with the result tables produced here.\n",
+    "For example, you can pass the `log_fc` column from `res_df` (indexed by `variable`) into {any}`decoupler.mt.ulm` together with a prior-knowledge network from [OmniPath](https://omnipathdb.org/) ({func}`~decoupler.op.collectri`, {func}`~decoupler.op.progeny`, {func}`~decoupler.op.hallmark`, …) to score transcription-factor or pathway activities.\n",
+    "We do not run an enrichment analysis here to keep this tutorial focused on the DE step; decoupler's [bulk functional analysis tutorial](https://decoupler.readthedocs.io/en/latest/notebooks/bulk.html) is a good starting point.\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
diff --git a/distance_tests.ipynb b/distance_tests.ipynb
index 23c7ad2..a138ab7 100644
--- a/distance_tests.ipynb
+++ b/distance_tests.ipynb
@@ -11,7 +11,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Pertpy offers several [Distances](https://pertpy.readthedocs.io/en/latest/usage/tools/pertpy.tools.Distance.html#pertpy.tools.Distance) to compute distances between groups of cells.\n",
+    "Pertpy offers several {class}`~pertpy.tools.Distance` distance metrics to compute distances between groups of cells.\n",
     "To determine whether two groups came from the same distribution as evaluated by the Distance metric, pertpy provides Monte-Carlo permutation tests.\n",
     "This can be a valuable tool to assess whether a treatment had a significant effect on the transcript profiles of the cells."
    ]
diff --git a/distances.ipynb b/distances.ipynb
index 8427cd1..0907586 100644
--- a/distances.ipynb
+++ b/distances.ipynb
@@ -20,7 +20,7 @@
     "\n",
     "Other implemented distance functions have different ways of incorporating the distribution of single cells into the distance calculation.\n",
     "\n",
-    "Please refer to [Distance](https://pertpy.readthedocs.io/en/latest/usage/tools/pertpy.tools.Distance.html#pertpy.tools.Distance) for a complete list of available distances."
+    "Please refer to {class}`~pertpy.tools.Distance` for a complete list of available distances."
    ]
   },
   {
@@ -697,7 +697,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "See the [statistical testing notebook](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks/distance_tests.html) for an example."
+    "See the {doc}`./distance_tests` for an example."
    ]
   }
  ],
diff --git a/expression_prediction_evaluation.ipynb b/expression_prediction_evaluation.ipynb
index 8e49051..23dfd0e 100644
--- a/expression_prediction_evaluation.ipynb
+++ b/expression_prediction_evaluation.ipynb
@@ -7,7 +7,7 @@
    "source": [
     "# Evaluation of expression prediction \n",
     "\n",
-    "Various models for prediction of gene expression upon perturbation or across modalities have been proposed, such as scGEN (see [pertpy tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks/scgen_perturbation_prediction.html)), GEARS, chemCPA, PerturbNet, scPreGAN, and others. They are commonly evaluated with ground-truth data to which predictions are compared with various metrics that capture expression distribution similarity or truthfulness of downstream results based on DE genes or embeddings. To assess how different metrics correspond to data changes, we performed simulations (using SPARSim package) and computed the pertpy metrics on different simulated datasets. Below we first discuss characteristics of alternative metrics that can be used for model evaluation and then show how pertpy may be used to evaluate new model predictions."
+    "Various models for prediction of gene expression upon perturbation or across modalities have been proposed, such as scGEN (see {doc}`./scgen_perturbation_prediction`), GEARS, chemCPA, PerturbNet, scPreGAN, and others. They are commonly evaluated with ground-truth data to which predictions are compared with various metrics that capture expression distribution similarity or truthfulness of downstream results based on DE genes or embeddings. To assess how different metrics correspond to data changes, we performed simulations (using SPARSim package) and computed the pertpy metrics on different simulated datasets. Below we first discuss characteristics of alternative metrics that can be used for model evaluation and then show how pertpy may be used to evaluate new model predictions."
    ]
   },
   {
@@ -29,7 +29,7 @@
     "- We simulated two groups of cells. This can be done by using the Cauchy distribution of LFCs, sampling the LFCs to add upon the parameters of one condition (e.g. ground truth or control) to obtain the second condition (e.g. prediction or perturbed). \n",
     "- We simulated different distances between the two groups. By increasing or decreasing the width of the Cauchy distribution we simulated smaller or larger differences between the two cell groups. Below we show how different metrics behave in the presence of small and large differences.\n",
     "\n",
-    "For a description of the here used metrics please refer to the [pertpy documentation](https://pertpy.readthedocs.io/en/latest/usage/tools/pertpy.tools.Distance.html) and [distance tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks/distances.html). Lower distance score corresponds to higher similarity."
+    "For a description of the here used metrics please refer to the {class}`~pertpy.tools.Distance` and {doc}`./distances`. Lower distance score corresponds to higher similarity."
    ]
   },
   {
@@ -207,7 +207,7 @@
    "id": "a7a4b42d-89ea-4f2d-a3f6-3bdf811c114d",
    "metadata": {},
    "source": [
-    "For simplicity, we use mock data in a format similar to `eval_adata` from [scGEN tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks/scgen_perturbation_prediction.html). Briefly, the adata should contain normalized log-transformed expression in X and two obs keys: cell_type and condition (ground truth control and perturbed as well as prediction of the perturbed condition). \n",
+    "For simplicity, we use mock data in a format similar to `eval_adata` from {doc}`./scgen_perturbation_prediction`. Briefly, the adata should contain normalized log-transformed expression in X and two obs keys: cell_type and condition (ground truth control and perturbed as well as prediction of the perturbed condition). \n",
     "\n",
     "The metrics are of relative nature and thus their interpretation relies on comparison across predictions obtained with different modeling settings (models, hyperparameters, training data, etc.) in order to select the best one. Thus, we created multiple predicted groups that correspond to better or worse prediction of the perturbed state. In particular, we generated one group that closely mimics the target, one that has expression shifted towards the input control, and one that has under-estimated variance. Please note that these simulations are over-simplified and we advise you to use a package designed specifically for scRNA-seq expression simulation, such as SPARSim, if you want to further evaluate individual metric performance."
    ]
diff --git a/milo.ipynb b/milo.ipynb
index ad27ed4..132afb2 100644
--- a/milo.ipynb
+++ b/milo.ipynb
@@ -206,7 +206,7 @@
    "id": "productive-growth",
    "metadata": {},
    "source": [
-    "When initializing the Milo object, we create a [MuData](https://mudata.readthedocs.io/en/latest/notebooks/quickstart_mudata.html) object which will store both the gene expression matrices (`rna` view) and cell count matrices used for differential abundance analysis (`milo` view). "
+    "When initializing the Milo object, we create a {class}`~mudata.MuData` object which will store both the gene expression matrices (`rna` view) and cell count matrices used for differential abundance analysis (`milo` view). "
    ]
   },
   {