Skip to content

Commit 8d3caac

Browse files
committed
Text refinement
1 parent a3e2efe commit 8d3caac

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

_posts/2025-01-27-matching_molecular_series.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ I worked together on this little project with my good friend (and software devel
1414
First things first, the actual code can be found [here](https://github.com/driesvr/MatchMolSeries). The logic behind MMS is well described in the paper linked above, but the broad strokes of it are fairly intuitive.
1515
Let’s assume you’re a medical chemist working on a new series. You make a few analogues on a given position. The data comes back and you notice that the SAR is really similar to that of another set of compounds you worked once upon a time for a different target. Intriguing! You dust off your old ELN, looking for the most potent substituents you made back then and put them into synthesis.
1616

17-
In a nutshell, that's what MMS does, only systematically. By finding other compound series where the SAR on a given position appears to track similarly, typically on different targets and/or scaffolds, we can use those other series to decide on what other groups we can try on a given position of our own scaffold.
17+
The above is in a nutshell what MMS does, except MMS enables you to do it systematically. By finding other compound series where the SAR on a given position appears to track similarly (typically on different targets and/or scaffolds) we can use those other series to decide on what other groups we can try on a given position of our own scaffold.
1818

1919
On a practical level, we can break it down into a few steps.
2020
- Break apart all compounds into cores and fragments
@@ -253,7 +253,7 @@ We start off by filtering out anything in either the reference or query dataset
253253
We then merge reference and query datasets on their fragment SMILES, identifying fragments present in both datasets. That by itself isn't sufficient: we still need to make sure they belong to a long enough series.
254254
To do that, we group by core+assay (for both reference and query) and count the number of fragments in each series. We then filter out any series with less than `min_series_length` fragments.
255255
In that same step, we also compute the cRMSD (a metric proposed by [Ehmki and Kramer](https://pubs.acs.org/doi/10.1021/acs.jcim.6b00709) to track the similarity between the two series) between the potency vectors of the reference and query series.
256-
It's computed as follows and provides an indication of how well trends align between the two series:
256+
It provides an indication of how well trends align between the two series, with lower values indicating better similarity (and a cRMSD of 0 indicating that the SAR tracks perfectly). It's computed as follows:
257257

258258
\\\[
259259
\text{cRMSD} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left[ (x_i - \bar{x}) - (y_i - \bar{y}) \right]^2}

0 commit comments

Comments
 (0)