Skip to content

Commit 657b9e8

Browse files
authored
Merge pull request #1 from DocMinus/updates
Updates
2 parents c7cebc1 + 7d27477 commit 657b9e8

9 files changed

Lines changed: 170 additions & 121 deletions

File tree

.gitignore

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,36 @@
11
# IDE
22
**/.idea/
33
**/.vscode/
4-
**/__pycache__
54
**/.ipynb_checkpoints
65

76
# other
87
/datasets/*_TD.*
98
/datasets/*.pkl
109
/datasets/*.gz
10+
.pytest_cache/
1111

12+
# Python build artifacts
13+
__pycache__/
14+
*.py[cod]
15+
*$py.class
16+
17+
# Distribution / packaging
18+
.Python
19+
build/
20+
develop-eggs/
21+
dist/
22+
downloads/
23+
eggs/
24+
.eggs/
25+
lib/
26+
lib64/
27+
parts/
28+
sdist/
29+
var/
30+
wheels/
31+
pip-wheel-metadata/
32+
share/python-wheels/
33+
*.egg-info/
34+
.installed.cfg
35+
*.egg
36+
MANIFEST
Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
"""
44
V2.1.4 (Mar. 08, 08:00:00 2023)
55
Update: 2023-06-24 (cleanup for ChemRxiv submission)
6+
Update: 2024-02-22 (minor cleanup and file renaming)
67
78
@author: Alexander Minidis (DocMinus)
89
Purpose: TDs from csv
@@ -59,20 +60,20 @@ def main():
5960
# Calculate TDs
6061
transforms_descriptors = transform_descriptors(cmpd1_smi, cmpd2_smi, prod_smi)
6162

62-
# for output create table with structures and combine with calculated TDs
63+
# combination of the three structure list to a df
6364
_df = pd.DataFrame(
6465
{"Compound 1": cmpd1_smi, "Compound 2": cmpd2_smi, "Product": prod_smi}
6566
)
66-
# In addition: filter when empty structures
67+
# filter when empty structures
6768
_df = _df[~((_df.iloc[:, :3] == "").any(axis=1))]
68-
# The three tables are concatenated to one
69+
# Final table combines the structure list and the TDs
6970
final_table = pd.concat(
7071
[in_rct_df["ID"], _df, transforms_descriptors], axis=1, join="inner"
7172
)
72-
# output (optional)
73-
print(final_table.tail())
7473
#############################################################################
75-
# Write pickle & csv file
74+
# Output, multiple options
75+
print(final_table.tail())
76+
# Write binary and tsv
7677
print("\nWriting to file: ", final_output_pkl)
7778
final_table.to_pickle(final_output_pkl)
7879
print("\nWriting to file: ", final_output_tsv)

README.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,36 +3,42 @@
33

44
# Reaction Transform descriptors
55
Python code to calculate reaction transform descriptors as described in [CHEMRXIV](https://chemrxiv.org/engage/chemrxiv/article-details/649888d41dcbb92a5e8e3475), by [@DocMinus](https://github.com/docminus) and [@DrAlatriste](https://github.com/DrAlatriste). <br>
6-
Not a full fledged package, some scripting know-how necessary to use or incorporate in own code might be necessary.
76

87
## Installation
98
See _environment_ folder.
9+
Updated the installation with a setup file to enable the tools to be part of ones Python environment. Testing has also been added.
1010

11-
## Usage
12-
Run the provided script by providing a file with tab/semicolon separated data (also comma or space, though not recommended):<br>
13-
`python 2AB_reaction_TDs.py path/inputfilename`<br>
11+
## Example Usage
12+
Run the example script by providing a file with tab/semicolon separated data (also comma or space, though not recommended):
13+
```shell
14+
python AB2C_reaction_TDs_example.py inputfilename
15+
```
1416
<br>
15-
You can get help by calling the script using -h: `python 2AB_reaction_TDs.py -h` <br>
17+
You can get help by calling the script using -h: `python AB2C_reaction_TDs_example.py -h` <br>
1618
<br>
17-
This particular script uses fileformat<br>
18-
_ID reactant1 reactant2 product_<br>
19+
This particular script expects the input order of the file as<br>
20+
21+
_ID reactant1 reactant2 product_ <br>
1922
<br>
20-
The script will provide a simple cleaning of the structures; "extreme" broken structures might not get fixed with the provided method.<br>
23+
Simple cleaning of structures is included; "extreme" broken structures might not get fixed with the provided method.
2124
<br>
22-
Two small test-sets are provided with made up reactions, one of them containing a "faulty" structure to demonstrate correct filtration in the end result. Alternatively, run the _test.py_ script (see below)<br>
25+
Two small test-sets are provided with made up reactions, one of them containing a "faulty" structure to demonstrate correct filtration in the output result. <br>Execute via: `python AB2C_reaction_TDs_examples.py ./datsets/testreactions.tsv`<br>
2326

2427
## Syntax
2528
If you only want to use the TD function, your script requires the following minimum lines with the smiles as string tuples (even if only a single reaction):
29+
```shell
30+
from td_tools.rxntools import transform_descriptors
31+
32+
output_table = transform_descriptors(['smiles_reactant1'],['smiles_reactant2'],['product'])
2633
```
27-
from td_tools.rxntools import transform_descriptors
28-
29-
output_table = transform_descriptors(['smiles_reactant1'],['smiles_reactant2'],['product'])
34+
A cleaning function as well as a file reader function is included for larger datasets:
35+
```shell
36+
from td_tools.rxntools import clean_smiles_multi, read_rct2pd
3037
```
31-
A cleaning function as well as a file reader function is included for larger datasets.<br>
32-
Provided scripts include examples on how to concatenate the structures versus the TDs.<br>
33-
<br>
34-
For quick testing and timing use `Python test.py`.<br>
35-
Not a pytest package, but it nevertheless does the trick for quick demonstrating/testing.<br>
38+
The provided script includes examples on how to concatenate the structures versus the TDs.<br>
39+
40+
## Testing
41+
Python testing has been added instead of the previous test.py, see the README.md under /tests.<br>
3642
<br>
3743

3844
### Acknowledgments

environment/README.md

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,33 @@
22

33
## Requirements
44
Python >= 3.9 is required to use the [modern style](https://peps.python.org/pep-0585/) of type annotations.<br>
5-
Recommended: 3.11 (due to increased performance over versions <=3.10)<br>
6-
Modules required are sort of standard for chemistry scripting, rdkit, pandas & numpy, the latter two are nowadays part of a standard conda install.
5+
Recommended: 3.11 (due to increased performance over vearlier versions)<br>
6+
Modules required are sort of standard for chemistry scripting, rdkit, pandas & numpy, the latter two are nowadays part of a standard conda install.
77

88

9-
## Installation with Anaconda/Miniconda
10-
If you nevertheless want a separate environment:<br>
11-
Run the two commands from the root directory.
9+
## Installation
10+
1. Anaconda/Miniconda
11+
If you nevertheless want a separate environment:<br>
12+
Run the two commands from the root directory.
1213

13-
```shell
14-
conda env create -f ./environment/conda.yaml
15-
conda activate rxn_tds
16-
```
14+
```shell
15+
conda env create -f ./environment/conda.yaml
16+
conda activate rxn_tds
17+
```
1718

18-
## Installation with Pip
19-
If you already have an environment you want to add this into, then:<br>
20-
Run the command from the root directory
19+
1b. (alternatively) Venv
20+
Note that venv would also work if you prefer that.
2121

22-
```shell
23-
python -m pip install -r ./environment/requirements.txt
24-
```
22+
2. Pip
23+
Now run the requirements with pip into this new environment or into any that you already have.<br>
24+
Run the command from the root directory
25+
26+
```shell
27+
pip install -r ./environment/requirements.txt
28+
pip install .
29+
```
30+
31+
The latter installs the rxn_tools into the environment. The example script would work without that, but testing requires that.
32+
33+
## Running Tests
34+
`pytest` is available for testing. See the README.md in /tests.

setup.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from setuptools import find_packages, setup
2+
3+
setup(
4+
name="td_tools",
5+
version="2.1.3",
6+
pythonrequires=">=3.9",
7+
packages=find_packages(),
8+
package_data={
9+
"td_tools": ["*.txt"],
10+
},
11+
description="Reaction Transform Descriptor Tools",
12+
author="DocMinus",
13+
author_email="alexander.minidis@gmail.com",
14+
url="https://github.com/DocMinus/RxnTransformDescriptors",
15+
)

test.py

Lines changed: 0 additions & 82 deletions
This file was deleted.

tests/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Running Tests
2+
`pytest` is available for testing. Follow these steps:
3+
1. Ensure you have installed the project dependencies, as described in the Installation section.
4+
2. Install the testing dependencies:
5+
```bash
6+
pip install pytest
7+
```
8+
followed by
9+
```bash
10+
pytest
11+
```
12+
13+
This command will discover and run all the test cases in the `tests/` directory.

tests/__init.py__

Whitespace-only changes.

tests/test_all.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/usr/bin/env python
2+
# coding: utf-8
3+
""" test script, creating articifial data and testing the TDs calculation for reactions
4+
only tests the combination and final outcome, not the individual functions
5+
2024-02-22; DocMinus
6+
"""
7+
8+
import pandas as pd
9+
import pytest
10+
11+
from td_tools.rxntools import clean_smiles_multi, transform_descriptors
12+
13+
14+
def test_clean_smiles_multi_and_transform_descriptors():
15+
dataset_size = 4 # number of compounds
16+
# we define some faulty/missing compounds, then the output table should have 3 rows less than the input table
17+
reactant1 = ["CCCN" for _ in range(dataset_size - 1)]
18+
reactant1.append("cc") # incorrect structure
19+
reactant1.append("CCO")
20+
reactant1.append("CCCl")
21+
reactant1.append("CCCl")
22+
total_dataset_size = len(reactant1)
23+
24+
reactant2 = ["CCCCO" for _ in range(dataset_size)]
25+
reactant2.append("CC")
26+
reactant2.append("CCO")
27+
reactant2.append("CCBr")
28+
29+
product = ["ClCC1=C(B)C(P)=CC(Br)=C1O" for _ in range(dataset_size)]
30+
product.append("cc") # incorrect structure
31+
product.append("") # missing structure
32+
product.append("CCI")
33+
34+
""" a total of 3 rows faulty rows, from bottom of created table it would 2nd, 3rd and 4th last row."""
35+
36+
g0 = clean_smiles_multi(reactant1)
37+
g1 = clean_smiles_multi(reactant2)
38+
g2 = clean_smiles_multi(product)
39+
40+
TD_numbers = transform_descriptors(g0, g1, g2)
41+
print(f"{TD_numbers.shape = }, {TD_numbers.shape[1] = }")
42+
43+
final_table = pd.DataFrame({"Compound 1": g0, "Compound 2": g1, "Product": g2})
44+
final_table = final_table[~((final_table.iloc[:, :3] == "").any(axis=1))]
45+
final_table = pd.concat([final_table, TD_numbers], axis=1, join="inner")
46+
47+
# check if the final table is as expected (3 rows less than the input table)
48+
assert (
49+
final_table.shape[0] == total_dataset_size - 3
50+
), "Number of rows in the final table is not as expected."
51+
52+
# Check that the 2nd, 3rd, and 4th last rows have been removed
53+
removed_indices = [
54+
total_dataset_size - 2,
55+
total_dataset_size - 3,
56+
total_dataset_size - 4,
57+
]
58+
for index in removed_indices:
59+
assert (
60+
index not in final_table.index
61+
), f"Row {index} should have been removed but is still in the final table."

0 commit comments

Comments
 (0)