Skip to content

Commit f70085d

Browse files
committed
initial commit
0 parents  commit f70085d

11 files changed

Lines changed: 10381 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Models to molecules is a cheminformatics blog.

_config.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
theme: minima
2+
title: Models to molecules
3+
description: A cheminformatics blog
4+
5+
show_excerpts: true

_layouts/home.html

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
layout: default
3+
---
4+
5+
<div class="home">
6+
<div class="post-content">
7+
{%- if site.posts.size > 0 -%}
8+
<ul class="post-list">
9+
{%- for post in site.posts -%}
10+
<li>
11+
{%- assign date_format = site.minima.date_format | default: "%b %-d, %Y" -%}
12+
<span class="post-meta">{{ post.date | date: date_format }}</span>
13+
<h3>
14+
<a class="post-link" href="{{ post.url | relative_url }}">
15+
{{ post.title | escape }}
16+
</a>
17+
</h3>
18+
{%- if site.show_excerpts -%}
19+
{{ post.excerpt }}
20+
{%- endif -%}
21+
</li>
22+
{%- endfor -%}
23+
</ul>
24+
{%- endif -%}
25+
</div>
26+
</div>

_posts/2025-01-01-fwa_edgecases.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
layout: post
3+
title: "Free-Wilson edge-cases"
4+
date: 2025-01-01
5+
---
6+
I've always been a fan of Free-Wilson Analysis. It's just such a useful tool: it's easy-to-use, it helps you understand SAR better and it generates all potentially interesting combinations of your compounds that you might have missed. What's not to love? In this post I'm going to address a few edge cases that the current code I'm using (written by the fantastic Pat Walters) can use a hand with.
7+
8+
9+
Pat has written up a wonderful post on Free-Wilson Analysis over on [Practical Cheminformatics](https://practicalcheminformatics.blogspot.com/2018/05/free-wilson-analysis.html) accompanied by this [notebook](https://colab.research.google.com/github/PatWalters/practical_cheminformatics_tutorials/blob/main/sar_analysis/free_wilson.ipynb), and I'll be using that very code as a starting point. You can find my notebook [here](https://github.com/driesvr/driesvr.github.io/blob/main/notebooks/free_wilson_cornercases.ipynb) if you'd like to follow along.
10+
11+
First, the edge cases. The current code struggles with molecules which have two R-groups on the same attachment point, or rings that attach to two attachment points simultaneously. I've drawn up three test cases that we use to verify that our improvements actually work:
12+
13+
![Edge case examples](/assets/edgecases.PNG)
14+
15+
16+
The first thing we need to fix is the way two R-groups on the same attachment point are handled. By default RDKit groups these into one attachment, separated by a period, e.g. `C[*:3].C[*:3]`. This causes trouble when we will be molzipping these compound back together later on, because we will be using the same attachment point twice which RDKit (rightfully) doesn't appreciate. This snippet changes the default behaviour to create a second attachment point on the double-substituted attachment points:
17+
```python
18+
from rdkit.Chem import rdRGroupDecomposition
19+
ps = rdRGroupDecomposition.RGroupDecompositionParameters()
20+
ps.allowMultipleRGroupsOnUnlabelled = True
21+
22+
match, miss = RGroupDecompose(core_mol,df.mol.values,asSmiles=True, options=ps)
23+
```
24+
25+
That fixes issues with the double-substituted compounds, but we still run into trouble if we have rings that attach to two attachment points. If we have a ring that attaches to R1 and R5, RDKit will put that moiety in both the R1 and R5. This is logical behaviour, but it doesn't play nicely with molzip and the way we're currently laying out the different R-groups. There's two cases here that we will look at in more detail. Firstlly, combining the ring `CCC(C[*:1])[*:5]` at R5 with another substituent at R5 (or on R1 with another R1, for that matter) doesn't really make sense - there's no sensible molecule we could make here, so the best we can do is to skip it with the following code snippet:
26+
```python
27+
import re
28+
def has_shared_number(string_tuple):
29+
"""Checks if any number within the [*:number] format is shared among strings in a tuple.
30+
31+
Args:
32+
string_tuple: A tuple containing strings with potential [*:number] patterns.
33+
34+
Returns:
35+
True if any number is shared, False otherwise.
36+
"""
37+
all_numbers = []
38+
for string in string_tuple:
39+
numbers = [int(match.group(1)) for match in re.finditer(r"\[\*:(\d+)\]", string)]
40+
all_numbers.extend(numbers)
41+
return len(set(all_numbers)) < len(all_numbers)
42+
43+
#Use this in our enumeration code
44+
prod_list = []
45+
for i,p in tqdm(enumerate(product(*enc.categories)),total=total_possible_products):
46+
core_smiles = rgroup_df.Core.values[0]
47+
if has_shared_number(p):
48+
continue
49+
50+
try:
51+
smi = (".".join(p))
52+
mol = Chem.MolFromSmiles(smi+"."+core_smiles)
53+
prod = Chem.molzip(mol)
54+
prod = Chem.RemoveAllHs(prod)
55+
prod_smi = Chem.MolToSmiles(prod)
56+
if prod_smi not in already_made_smiles:
57+
desc = enc.transform([p])
58+
prod_pred_ic50 = full_model.predict(desc)[0]
59+
prod_list.append([prod_smi,prod_pred_ic50])
60+
except:
61+
print(p)
62+
break
63+
64+
```
65+
This works in the sense that it finishes, but doesn't let us use the fused ring in our enumerations anymore. This brings us to the second case: having the same ring fragment appear twice in R1 and R5, which _does_ correspond to a sensible molecule here: this is the exact way such a molecule would get decomposed by our procedure. In our current iteration of the code it gets removed, so we need to rewrite it a bit to de-duplicate in the smiles joining step to avoid this issue.
66+
67+
68+
```python
69+
prod_list = []
70+
for i,p in tqdm(enumerate(product(*enc.categories)),total=total_possible_products):
71+
core_smiles = rgroup_df.Core.values[0]
72+
if has_shared_number(p):
73+
if len(set(p)) != len(p):
74+
smi = (".".join(list(set(p))))
75+
else:
76+
continue
77+
else:
78+
smi = (".".join(p))
79+
mol = Chem.MolFromSmiles(smi+"."+core_smiles)
80+
prod = Chem.molzip(mol)
81+
prod = Chem.RemoveAllHs(prod)
82+
prod_smi = Chem.MolToSmiles(prod)
83+
if prod_smi not in already_made_smiles:
84+
desc = enc.transform([p])
85+
prod_pred_ic50 = full_model.predict(desc)[0]
86+
prod_list.append([prod_smi,prod_pred_ic50])
87+
```
88+
And there we have it! Looking at our final enumeration results, we can see that the enumeration now finishes without failures and we can find our fused systems back in the results:
89+
![Edge case result](/assets/edgecases_result.PNG)
90+
91+
I hope this helps make your F-W explorations a bit more robust. Please let me know on Bluesky/X if you encounter any bugs!
92+
93+

about.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
layout: page
3+
title: About
4+
permalink: /about/
5+
---
6+
I'm Dries Van Rompaey and I’m passionate about using computation to help advance drug discovery programs.
7+
Models to molecules is a blog about anything involving cheminformatics, machine learning and general drug discovery.
8+
Feel free to reach out or connect with me if you’d like to chat or collaborate.
9+
10+
- **GitHub:** [@driesvr](https://github.com/driesvr)
11+
- **Twitter:** [@d_vanrompaey](https://twitter.com/d_vanrompaey)
12+
- **LinkedIn:** [@driesvanrompaey](https://www.linkedin.com/in/driesvanrompaey)
13+
- **Bluesky:** [@dries-vr.bsky.social](https://bsky.app/profile/dries-vr.bsky.social)

assets/edgecases.PNG

29 KB
Loading

assets/edgecases_result.PNG

23.1 KB
Loading

blog.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
layout: home
3+
title: Blog
4+
permalink: /blog/
5+
header_title: false
6+
---

index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
layout: home
3+
title: ""
4+
permalink: /
5+
---

notebooks/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This holds all notebooks associated with the main blog.

0 commit comments

Comments
 (0)