Skip to content

Commit 349bf14

Browse files
committed
Ad MMS post
1 parent 1e1e0d5 commit 349bf14

3 files changed

Lines changed: 103 additions & 42 deletions

File tree

_includes/head.html

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
<![CDATA[<head>
2+
<meta charset="utf-8">
3+
<meta http-equiv="X-UA-Compatible" content="IE=edge">
4+
<meta name="viewport" content="width=device-width, initial-scale=1">
5+
{%- seo -%}
6+
<link rel="stylesheet" href="{{ "/assets/main.css" | relative_url }}">
7+
{%- feed_meta -%}
8+
9+
<!-- MathJax Configuration -->
10+
<script>
11+
MathJax = {
12+
tex: {
13+
inlineMath: [['$', '$'], ['\\(', '\\)']],
14+
displayMath: [['$$', '$$'], ['\\[', '\\]']],
15+
processEscapes: true
16+
},
17+
svg: {
18+
fontCache: 'global'
19+
}
20+
};
21+
</script>
22+
<script type="text/javascript" id="MathJax-script" async
23+
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
24+
</script>
25+
</head>]]>

_layouts/default.html

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<!DOCTYPE html>
2+
<html lang="{{ page.lang | default: site.lang | default: "en" }}">
3+
4+
{%- include head.html -%}
5+
6+
<body>
7+
<header>
8+
<div class="wrapper">
9+
<a class="site-title" rel="author" href="{{ "/" | relative_url }}">{{ site.title | escape }}</a>
10+
</div>
11+
</header>
12+
13+
<main class="page-content" aria-label="Content">
14+
<div class="wrapper">
15+
{{ content }}
16+
</div>
17+
</main>
18+
19+
<footer class="site-footer h-card">
20+
<div class="wrapper">
21+
<div class="footer-col-wrapper">
22+
<div class="footer-col footer-col-1">
23+
<p>{{- site.description | escape -}}</p>
24+
</div>
25+
</div>
26+
</div>
27+
</footer>
28+
</body>
29+
</html>

_posts/2025-01-27-matching_molecular_series.md

Lines changed: 49 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ splitting_reactions = [
6363
These reactions should match the [Matsy](https://pubs.acs.org/doi/10.1021/jm500022q) implementation and break single acyclic bonds between either a ring atom and any other atom,
6464
or a heteroatom bonded to a non-sp2 C aton. I'm terminating the bonds we broke with an `At` atom. That's somewhat arbitrary, but I like `At` here because it gives you _technically_ valid molecules that can be read into most cheminformatics software,
6565
it's easy to recognize (at least, until someone tries to put one in a drug) and it _could_ be short for "attachment point". You could also terminate with dummy atoms. We can of course consider other fragmentationoptions here, like the classic
66-
(Hussain-Rea)[https://pubs.acs.org/doi/10.1021/ci900450m] implementation that splits at any acyclic bond. Any fragmentation method will work here. As long as it splits the molecule into two parts,
66+
[Hussain-Rea](https://pubs.acs.org/doi/10.1021/ci900450m) implementation that splits at any acyclic bond. Any fragmentation method will work here. As long as it splits the molecule into two parts,
6767
downstream steps will be the same. While we're at it, let's also define a reaction to combine the two fragments back together:
6868

6969
```python
@@ -251,7 +251,13 @@ Coming back to the workflow we outlined at the beginning, we will need to write
251251
We start off by filtering out anything in either the reference or query dataset that has less than `min_series_length` fragments: anything that doesn't belong to a long enough series can be safely discarded.
252252
We then merge reference and query datasets on their fragment SMILES, identifying fragments present in both datasets. That by itself isn't sufficient: we still need to make sure they belong to a long enough series.
253253
To do that, we group by core+assay (for both reference and query) and count the number of fragments in each series. We then filter out any series with less than `min_series_length` fragments.
254-
In that same step, we also compute the cosine similarity between the potency vectors of the reference and query series. This provides an indication of how well trends align between the two series.
254+
In that same step, we also compute the cRMSD (a metric proposed by [Ehmki and Kramer](https://pubs.acs.org/doi/10.1021/acs.jcim.6b00709) to track the similarity between the two series) between the potency vectors of the reference and query series.
255+
It's computed as follows and provides an indication of how well trends align between the two series:
256+
257+
\[
258+
\text{cRMSD} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left[ (x_i - \bar{x}) - (y_i - \bar{y}) \right]^2}
259+
\]
260+
255261
We also track fragments in common and the potencies in both assay sets, here by casting them to a string and joining them with a pipe character.
256262
Finally, we left join the reference and query datasets again on their fragment smiles, but select only those where the query fragment is missing: this set of un-matched fragments will contain - among other things - our new R-groups
257263
of interest. By grouping on assay+core again and merging that with the matched series dataframe, we make sure we only retain R-groups that actually belong to the same matched series,
@@ -378,30 +384,30 @@ If all went well, we should now have a working implementation of matched molecul
378384
functionality was covered in this post. All that's left now is to test everything works and see if we can get some interesting suggestions for the next compounds to try!
379385

380386
```python
381-
def test_concatenation_order(self):
382-
"""
383-
Verifies that the system correctly concatenates
384-
molecules and their respective potency values
385-
386-
"""
387-
ref_data = pd.DataFrame({
388-
'smiles': ['c1ccccc1F', 'c1ccccc1Cl', 'c1ccccc1Br','c1ccccc1N', 'c1ccccc1OC(F)(F)F'],
389-
'potency': [1.0, 2.0, 3.0, 4.0, 5.0],
390-
'assay_col': ['assay1']*5
391-
})
392-
393-
query_data = pd.DataFrame({
394-
'smiles': ['c1cnccc1F', 'c1cnccc1Cl', 'c1cnccc1Br'],
395-
'potency': [1.0, 2.0, 3.0],
396-
'assay_col': ['assay1']*3
397-
})
398-
self.mms.fragment_molecules(ref_data, assay_col='assay_col', query_or_ref='ref')
399-
result = self.mms.query_fragments(query_data, min_series_length=3, assay_col='assay_col')
400-
new_frags = result.new_fragments[0].split('|')
401-
ref_potency = result.new_fragments_ref_potency[0].split('|')
402-
print(new_frags, ref_potency)
403-
self.assertEqual(new_frags.index('N[At]'), ref_potency.index('4.0'))
404-
self.assertEqual(new_frags.index('FC(F)(F)O[At]'), ref_potency.index('5.0'))
387+
def test_concatenation_order(self):
388+
"""
389+
Verifies that the system correctly concatenates
390+
molecules and their respective potency values
391+
392+
"""
393+
ref_data = pd.DataFrame({
394+
'smiles': ['c1ccccc1F', 'c1ccccc1Cl', 'c1ccccc1Br','c1ccccc1N', 'c1ccccc1OC(F)(F)F'],
395+
'potency': [1.0, 2.0, 3.0, 4.0, 5.0],
396+
'assay_col': ['assay1']*5
397+
})
398+
399+
query_data = pd.DataFrame({
400+
'smiles': ['c1cnccc1F', 'c1cnccc1Cl', 'c1cnccc1Br'],
401+
'potency': [1.0, 2.0, 3.0],
402+
'assay_col': ['assay1']*3
403+
})
404+
self.mms.fragment_molecules(ref_data, assay_col='assay_col', query_or_ref='ref')
405+
result = self.mms.query_fragments(query_data, min_series_length=3, assay_col='assay_col')
406+
new_frags = result.new_fragments[0].split('|')
407+
ref_potency = result.new_fragments_ref_potency[0].split('|')
408+
print(new_frags, ref_potency)
409+
self.assertEqual(new_frags.index('N[At]'), ref_potency.index('4.0'))
410+
self.assertEqual(new_frags.index('FC(F)(F)O[At]'), ref_potency.index('5.0'))
405411

406412
```
407413
This test will check that the two additional R-groups in the reference dataset (in this case, an aniline and a trifluoromethoxy) can be retrieved based on the matching series of
@@ -416,22 +422,23 @@ query_df = pd.DataFrame({
416422
})
417423
```
418424

425+
This query being a short series, we will get a lot of matches. Let't take a look at one of them in more detail. This series comes from patent US8765744, targeting 11-beta-hydroxysteroid dehydrogenase 1,
426+
perhaps better known as cortisone reductase. The cRMSD of this series is 0.067, indicating that the trends in potency between the reference and query series are very similar: here we have 6.48 for the
427+
F analog, 7.14 for the Cl and 7.59 for the Br. As such, many of the newly identified R-groups could be considered relevant compounds to try:
428+
419429
| R-group SMARTS | Value |
420430
|----------------|-------|
421-
| O[At] | 6.31 |
422-
| CCO[At] | 7.3 |
423-
| NC(=O)[At] | 6.77 |
424-
| [At]c1cn[nH]c1 | 7.7 |
425-
| CO[At] | 8.0 |
426-
| Cn1nccc1[At] | 7.52 |
427-
| [At]c1ccncc1 | 7.52 |
428-
| C[At] | 7.1 |
429-
| [C-]#[N+][At] | 7.4 |
430-
431-
The series above comes from patent US9012443, targeting Sodium channel protein type 9 subunit alpha, better known as NaV1.7.
432-
I will note that the trends in this series are slightly different than our query - here we have 6.68 for the F analog, 7.3 for the Cl and 6.96 for the Br.
433-
Nevertheless, the cosine similarity is quite high at 0.99, and many of the R-groups proposed herein could be considered as relevant compounds to try. Switching to a different similarity
434-
metric may help identify even more relevant R-groups, as suggested by [Ehmki and Kramer ](https://pubs.acs.org/doi/10.1021/acs.jcim.6b00709) in their paper examining different similarity metrics for SAR transfer.
435-
I'll close off this post with some additional recommended reading on MMS: [Original MMS paper](https://pubs.acs.org/doi/10.1021/jm200026b),
431+
| FC(F)O[At] | 7.09 |
432+
| Cn1ccc([At])cc1=O | 6.74 |
433+
| FC(F)(F)[At] | 7.09 |
434+
| [At]C1CC1 | 7.64 |
435+
| N#C[At] | 6.18 |
436+
| C[At] | 7.19 |
437+
| CO[At] | 7.46 |
438+
| CC(C)(C)[At] | 6.84 |
439+
| FC(F)[At] | 7.06 |
440+
| FC(F)(F)O[At] | 7.24 |
441+
442+
I'll close off this post with some recommended reading on MMS: [Original MMS paper](https://pubs.acs.org/doi/10.1021/jm200026b), [Ehmki and Kramer on metrics for SAR transfer](https://pubs.acs.org/doi/10.1021/acs.jcim.6b00709),
436443
[Matsy paper](https://pubs.acs.org/doi/10.1021/jm500022q), and [MMS for ADME](https://pubs.acs.org/doi/full/10.1021/acs.jcim.0c00269). That's it for now - hope you found this interesting and as always,
437-
please let me know if you spot any mistakes or better yet, submit a PR!
444+
please let me know if you spot any mistakes or better yet, submit a PR to fix them!

0 commit comments

Comments
 (0)