Report
Dear Authors ,
Thank you for developing and sharing this method. While reading the implementation of the co-occurrence calculation, I am trying to better understand the normalization strategy , particularly why the denominator is defined as row_sums[c, r] rather than the total number of neighbors surrounding center cells of type c used in the following code in src/squidpy/gr/_ppatterns.py :
`occ_prob = np.zeros((k, k, l_val), dtype=np.float64)
row_sums = counts.sum(axis=0)
totals = row_sums.sum(axis=0)
for r in prange(l_val):
probs = row_sums[:, r] / totals[r]
for c in range(k):
for i in range(k):
if probs[i] != 0.0 and row_sums[c, r] != 0.0:
occ_prob[i, c, r] = (
counts[c, i, r] / row_sums[c, r]
) / probs[i]`
From the code, counts[c, i, r] appears to represent the number of neighboring pairs within radius r, where the center cell is of type c and the neighboring cell is of type i.
The normalization term is defined asrow_sums = counts.sum(axis=0)
which yields$$
\mathrm{row_sums}[i,r]
\sum_c \mathrm{counts}[c,i,r].
$$
Therefore, row_sums[c, r] corresponds to
$$
\sum_{c'} \mathrm{counts}[c',c,r],
$$
which seems to represent the total number of times cell type $c$ appears as a neighboring cell across all center-cell types at radius $r$.
My question concerns the denominator
$$
\frac{\mathrm{counts}[c,i,r]}
{\mathrm{row_sums}[c,r]}.
$$
If the goal is to estimate a conditional probability such as
$$
P(\mathrm{neighbor}=i \mid \mathrm{center}=c),
$$
I would have expected the denominator to be
$$
\sum_i \mathrm{counts}[c,i,r],
$$
i.e., the total number of neighbors observed around center cells of type $c$, since this corresponds directly to conditioning on the center-cell type.
In contrast, the current implementation uses
$$
\sum_{c'} \mathrm{counts}[c',c,r],
$$
which appears to normalize by the frequency with which cell type $c$ occurs as a neighboring cell rather than as a center cell.
I would greatly appreciate any explanation of the statistical reasoning behind this normalization strategy.
Thank you very much for your time and for making the implementation publicly available.
Versions
Report
Dear Authors ,
Thank you for developing and sharing this method. While reading the implementation of the co-occurrence calculation, I am trying to better understand the normalization strategy , particularly why the denominator is defined as row_sums[c, r] rather than the total number of neighbors surrounding center cells of type c used in the following code in src/squidpy/gr/_ppatterns.py :
`occ_prob = np.zeros((k, k, l_val), dtype=np.float64)
row_sums = counts.sum(axis=0)
totals = row_sums.sum(axis=0)
for r in prange(l_val):
probs = row_sums[:, r] / totals[r]
From the code, counts[c, i, r] appears to represent the number of neighboring pairs within radius r, where the center cell is of type c and the neighboring cell is of type i.
The normalization term is defined as
row_sums = counts.sum(axis=0)which yields$$
\mathrm{row_sums}[i,r]
\sum_c \mathrm{counts}[c,i,r].
$$
Therefore,
row_sums[c, r]corresponds towhich seems to represent the total number of times cell type$c$ appears as a neighboring cell across all center-cell types at radius $r$ .
My question concerns the denominator
If the goal is to estimate a conditional probability such as
I would have expected the denominator to be
i.e., the total number of neighbors observed around center cells of type$c$ , since this corresponds directly to conditioning on the center-cell type.
In contrast, the current implementation uses
which appears to normalize by the frequency with which cell type$c$ occurs as a neighboring cell rather than as a center cell.
I would greatly appreciate any explanation of the statistical reasoning behind this normalization strategy.
Thank you very much for your time and for making the implementation publicly available.
Versions