Confusion about the correct formulation and interpretation of the co_occurrence calculation method in gr.co_occurrence

### Report

Dear  Authors ,
    Thank you for developing and sharing this method. While reading the implementation of the co-occurrence calculation, I am trying to better understand the normalization strategy , particularly **why the denominator is defined as row_sums[c, r] rather than the total number of neighbors surrounding center cells of type c** used in the following code in src/squidpy/gr/_ppatterns.py :
`occ_prob = np.zeros((k, k, l_val), dtype=np.float64)

row_sums = counts.sum(axis=0)
totals = row_sums.sum(axis=0)

for r in prange(l_val):
    probs = row_sums[:, r] / totals[r]

    for c in range(k):
        for i in range(k):
            if probs[i] != 0.0 and row_sums[c, r] != 0.0:
                occ_prob[i, c, r] = (
                    counts[c, i, r] / row_sums[c, r]
                ) / probs[i]`
From the code, counts[c, i, r] appears to represent the number of neighboring pairs within radius r, where the center cell is of type c and the neighboring cell is of type i.

The normalization term is defined as`row_sums = counts.sum(axis=0)`
which yields$$
\mathrm{row\_sums}[i,r]
=
\sum_c \mathrm{counts}[c,i,r].
$$

Therefore, `row_sums[c, r]` corresponds to

$$
\sum_{c'} \mathrm{counts}[c',c,r],
$$

which seems to represent the total number of times cell type $c$ appears as a neighboring cell across all center-cell types at radius $r$.

My question concerns the denominator

$$
\frac{\mathrm{counts}[c,i,r]}
     {\mathrm{row\_sums}[c,r]}.
$$

If the goal is to estimate a conditional probability such as

$$
P(\mathrm{neighbor}=i \mid \mathrm{center}=c),
$$

I would have expected the denominator to be

$$
\sum_i \mathrm{counts}[c,i,r],
$$

i.e., the total number of neighbors observed around center cells of type $c$, since this corresponds directly to conditioning on the center-cell type.

In contrast, the current implementation uses

$$
\sum_{c'} \mathrm{counts}[c',c,r],
$$

which appears to normalize by the frequency with which cell type $c$ occurs as a neighboring cell rather than as a center cell.

I would greatly appreciate any explanation of the statistical reasoning behind this normalization strategy.

Thank you very much for your time and for making the implementation publicly available.

### Versions

```python

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about the correct formulation and interpretation of the co_occurrence calculation method in gr.co_occurrence #1205

Report

The normalization term is defined as`row_sums = counts.sum(axis=0)`
which yields$$
\mathrm{row_sums}[i,r]

Versions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Confusion about the correct formulation and interpretation of the co_occurrence calculation method in gr.co_occurrence #1205

Description

Report

The normalization term is defined asrow_sums = counts.sum(axis=0) which yields$$ \mathrm{row_sums}[i,r]

Versions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

The normalization term is defined as`row_sums = counts.sum(axis=0)`
which yields$$
\mathrm{row_sums}[i,r]