Skip to content

Commit 5327b2b

Browse files
blaiszikclaude
andauthored
Patch/restore mdf client (#473)
* Restore missing mdf_client.py from design-renaissance branch This file was part of PR #469 but was not included in the merge, causing ModuleNotFoundError when importing foundry. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix DOI search to return correct dataset The forge DOI search can return multiple results where only one actually has the matching DOI. Previously, get_metadata_by_doi() blindly returned the first result, which often didn't have the requested DOI. Now it iterates through results to find the one with the exact DOI match, fixing test_dataframe_search_by_doi and test_dataframe_download_by_doi tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Move torch/tensorflow to optional extras to fix CI disk space The combined size of torch, tensorflow, and NVIDIA CUDA dependencies exceeded GitHub Actions runner disk space (~4GB+). These ML frameworks are now available as optional extras via pip install .[torch] or pip install .[tensorflow]. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix flake8 linting errors - Remove unused imports (sys, rprint, Optional, pandas, numpy) - Fix unused exception variable - Remove f-string without placeholders - Split long line in MCP server description - Add noqa comment for intentional re-export Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Replace mdf_forge with internal MDFClient in tests Update test imports to use foundry.mdf_client.MDFClient instead of mdf_forge.Forge, which is no longer a required dependency. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add optional extras and document installation Move heavy ML dependencies to optional extras to reduce default install size: - pip install foundry-ml[torch] - pip install foundry-ml[tensorflow] - pip install foundry-ml[huggingface] - pip install foundry-ml[excel] - pip install foundry-ml[examples] - pip install foundry-ml[dev] Update README with extras install instructions and NumPy 2.0 compatibility note. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix DOI search and improve MDFClient query handling MDFClient improvements: - Add Globus Search index ID constants (MDF_INDEX_ID, MDF_TEST_INDEX_ID) - Add match_source_names() method with automatic version suffix stripping - Add _has_field_filters property for elegant advanced mode detection - Use advanced=True automatically for DOI and source_name searches (required for exact field matching in Globus Search) - Add try/finally to ensure query state is always reset after search Foundry search fix: - Pass free-text query to Globus Search for server-side filtering instead of fetching 10 results and filtering client-side - This fixes searches like f.search("Computational Band Gaps") that were failing when the target dataset wasn't in the first 10 results Test additions: - Add test_load_mp_band_gaps_dataset to verify DOI-based dataset loading Re-rendered example notebooks with updated outputs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 4dd9c23 commit 5327b2b

12 files changed

Lines changed: 618 additions & 311 deletions

File tree

examples/00_hello_foundry/hello_foundry.ipynb

Lines changed: 118 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,16 @@
4545
},
4646
{
4747
"cell_type": "code",
48-
"execution_count": null,
48+
"execution_count": 2,
4949
"metadata": {},
5050
"outputs": [],
51-
"source": "from foundry import Foundry\n\n# Create a Foundry client (uses HTTPS download by default)\n# For cloud environments (Colab, etc.), add: no_browser=True, no_local_server=True\nf = Foundry()"
51+
"source": [
52+
"from foundry import Foundry\n",
53+
"\n",
54+
"# Create a Foundry client (uses HTTPS download by default)\n",
55+
"# For cloud environments (Colab, etc.), add: no_browser=True, no_local_server=True\n",
56+
"f = Foundry()"
57+
]
5258
},
5359
{
5460
"cell_type": "markdown",
@@ -99,59 +105,19 @@
99105
" <td>root=2022</td>\n",
100106
" <td>10.18126/jos5-wj65</td>\n",
101107
" </tr>\n",
102-
" <tr>\n",
103-
" <th>1</th>\n",
104-
" <td>foundry_assorted_computational_band_gaps_v1.1</td>\n",
105-
" <td>Graph Network Based Deep Learning of Band Gaps...</td>\n",
106-
" <td>root=2021</td>\n",
107-
" <td>10.18126/7io9-1z9k</td>\n",
108-
" </tr>\n",
109-
" <tr>\n",
110-
" <th>2</th>\n",
111-
" <td>foundry_experimental_band_gaps_v1.1</td>\n",
112-
" <td>Graph Network Based Deep Learning of Band Gaps...</td>\n",
113-
" <td>root=2021</td>\n",
114-
" <td>10.18126/wg3u-g8vu</td>\n",
115-
" </tr>\n",
116-
" <tr>\n",
117-
" <th>3</th>\n",
118-
" <td>foundry_aflow_band_gaps_v1.1</td>\n",
119-
" <td>Graph Network Based Deep Learning of Band Gaps...</td>\n",
120-
" <td>root=2021</td>\n",
121-
" <td>10.18126/6fdy-bsam</td>\n",
122-
" </tr>\n",
123-
" <tr>\n",
124-
" <th>4</th>\n",
125-
" <td>foundry_oqmd_band_gaps_v1.1</td>\n",
126-
" <td>Graph Network Based Deep Learning of Band Gaps...</td>\n",
127-
" <td>root=2021</td>\n",
128-
" <td>10.18126/w1ey-9y8b</td>\n",
129-
" </tr>\n",
130108
" </tbody>\n",
131109
"</table>\n",
132110
"</div>"
133111
],
134112
"text/plain": [
135-
" dataset_name \\\n",
136-
"0 foundry_g4mp2_solvation_v1.2 \n",
137-
"1 foundry_assorted_computational_band_gaps_v1.1 \n",
138-
"2 foundry_experimental_band_gaps_v1.1 \n",
139-
"3 foundry_aflow_band_gaps_v1.1 \n",
140-
"4 foundry_oqmd_band_gaps_v1.1 \n",
113+
" dataset_name \\\n",
114+
"0 foundry_g4mp2_solvation_v1.2 \n",
141115
"\n",
142116
" title year \\\n",
143117
"0 DFT Estimates of Solvation Energy in Multiple ... root=2022 \n",
144-
"1 Graph Network Based Deep Learning of Band Gaps... root=2021 \n",
145-
"2 Graph Network Based Deep Learning of Band Gaps... root=2021 \n",
146-
"3 Graph Network Based Deep Learning of Band Gaps... root=2021 \n",
147-
"4 Graph Network Based Deep Learning of Band Gaps... root=2021 \n",
148118
"\n",
149119
" DOI FoundryDataset \n",
150-
"0 10.18126/jos5-wj65 <foundry.foundry_dataset.FoundryDataset object... \n",
151-
"1 10.18126/7io9-1z9k <foundry.foundry_dataset.FoundryDataset object... \n",
152-
"2 10.18126/wg3u-g8vu <foundry.foundry_dataset.FoundryDataset object... \n",
153-
"3 10.18126/6fdy-bsam <foundry.foundry_dataset.FoundryDataset object... \n",
154-
"4 10.18126/w1ey-9y8b <foundry.foundry_dataset.FoundryDataset object... "
120+
"0 10.18126/jos5-wj65 <foundry.foundry_dataset.FoundryDataset object... "
155121
]
156122
},
157123
"execution_count": 3,
@@ -187,7 +153,7 @@
187153
"<h2>DFT Estimates of Solvation Energy in Multiple Solvents</h2>Ward, Logan; Dandu, Naveen; Blaiszik, Ben; Narayanan, Badri; Assary, Rajeev S.; Redfern, Paul C.; Foster, Ian; Curtiss, Larry A.<p>DOI: 10.18126/jos5-wj65</p><h3>Dataset</h3><table><tr><th>short_name</th><td>g4mp2_solvation</td></tr><tr><th>data_type</th><td>tabular</td></tr><tr><th>task_type</th><td><ul><li>supervised</li></ul></td></tr><tr><th>domain</th><td><ul><li>materials science</li><li>chemistry</li></ul></td></tr><tr><th>n_items</th><td>130258.0</td></tr><tr><th>splits</th><td><ul><li><table><tr><th>type</th><td>train</td></tr><tr><th>path</th><td>g4mp2_data.json</td></tr><tr><th>label</th><td>train</td></tr></table></li></ul></td></tr><tr><th>keys</th><td><table><tr><th>key</th><th>type</th><th>filter</th><th>description</th><th>units</th><th>classes</th></tr><tr><td><ul><li>smiles_0</li></ul></td><td>input</td><td></td><td>Input SMILES string</td><td></td><td></td></tr><tr><td><ul><li>smiles_1</li></ul></td><td>input</td><td></td><td>SMILES string after relaxation</td><td></td><td></td></tr><tr><td><ul><li>inchi_0</li></ul></td><td>input</td><td></td><td>InChi after generating coordinates with CORINA</td><td></td><td></td></tr><tr><td><ul><li>inchi_1</li></ul></td><td>input</td><td></td><td>InChi after relaxation</td><td></td><td></td></tr><tr><td><ul><li>xyz</li></ul></td><td>input</td><td></td><td>InChi after relaxation</td><td>XYZ coordinates after relaxation</td><td></td></tr><tr><td><ul><li>atomic_charges</li></ul></td><td>input</td><td></td><td>Atomic charges on each atom, as predicted from B3LYP</td><td></td><td></td></tr><tr><td><ul><li>A</li></ul></td><td>input</td><td></td><td>Rotational constant, A</td><td>GHz</td><td></td></tr><tr><td><ul><li>B</li></ul></td><td>input</td><td></td><td>Rotational constant, B</td><td>GHz</td><td></td></tr><tr><td><ul><li>C</li></ul></td><td>input</td><td></td><td>Rotational constant, C</td><td>GHz</td><td></td></tr><tr><td><ul><li>inchi_1</li></ul></td><td>input</td><td></td><td>InChi after relaxation</td><td></td><td></td></tr><tr><td><ul><li>n_electrons</li></ul></td><td>input</td><td></td><td>Number of electrons</td><td></td><td></td></tr><tr><td><ul><li>n_heavy_atoms</li></ul></td><td>input</td><td></td><td>Number of non-hydrogen atoms</td><td></td><td></td></tr><tr><td><ul><li>n_atom</li></ul></td><td>input</td><td></td><td>Number of atoms in molecule</td><td></td><td></td></tr><tr><td><ul><li>mu</li></ul></td><td>input</td><td></td><td>Dipole moment</td><td>D</td><td></td></tr><tr><td><ul><li>alpha</li></ul></td><td>input</td><td></td><td>Isotropic polarizability</td><td>a_0^3</td><td></td></tr><tr><td><ul><li>R2</li></ul></td><td>input</td><td></td><td>Electronic spatial extant</td><td>a_0^2</td><td></td></tr><tr><td><ul><li>cv</li></ul></td><td>input</td><td></td><td>Heat capacity at 298.15K</td><td>cal/mol-K</td><td></td></tr><tr><td><ul><li>g4mp2_hf298</li></ul></td><td>target</td><td></td><td>G4MP2 Standard Enthalpy of Formation, 298K</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>bandgap</li></ul></td><td>input</td><td></td><td>B3LYP Band gap energy</td><td>Ha</td><td></td></tr><tr><td><ul><li>homo</li></ul></td><td>input</td><td></td><td>B3LYP Energy of HOMO</td><td>Ha</td><td></td></tr><tr><td><ul><li>lumo</li></ul></td><td>input</td><td></td><td>B3LYP Energy of LUMO</td><td>Ha</td><td></td></tr><tr><td><ul><li>zpe</li></ul></td><td>input</td><td></td><td>B3LYP Zero point vibrational energy</td><td>Ha</td><td></td></tr><tr><td><ul><li>u0</li></ul></td><td>input</td><td></td><td>B3LYP Internal energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>u</li></ul></td><td>input</td><td></td><td>B3LYP Internal energy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>h</li></ul></td><td>input</td><td></td><td>B3LYP Enthalpy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>u0_atom</li></ul></td><td>input</td><td></td><td>B3LYP atomization energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g</li></ul></td><td>input</td><td></td><td>B3LYP Free energy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_0k</li></ul></td><td>target</td><td></td><td>G4MP2 Internal energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_energy</li></ul></td><td>target</td><td></td><td>G4MP2 Internal energy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_enthalpy</li></ul></td><td>target</td><td></td><td>G4MP2 Enthalpy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_free</li></ul></td><td>target</td><td></td><td>G4MP2 Free eergy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_atom</li></ul></td><td>target</td><td></td><td>G4MP2 atomization energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>sol_acetone</li></ul></td><td>target</td><td></td><td>Solvation energy, acetone</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_acn</li></ul></td><td>target</td><td></td><td>Solvation energy, acetonitrile</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_dmso</li></ul></td><td>target</td><td></td><td>Solvation energy, dimethyl sulfoxide</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_ethanol</li></ul></td><td>target</td><td></td><td>Solvation energy, ethanol</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_water</li></ul></td><td>target</td><td></td><td>Solvation energy, water</td><td>kcal/mol</td><td></td></tr></table></td></tr></table>"
188154
],
189155
"text/plain": [
190-
"<foundry.foundry_dataset.FoundryDataset at 0x1342b8230>"
156+
"<foundry.foundry_dataset.FoundryDataset at 0x140201070>"
191157
]
192158
},
193159
"execution_count": 4,
@@ -214,10 +180,70 @@
214180
},
215181
{
216182
"cell_type": "code",
217-
"execution_count": null,
183+
"execution_count": 5,
218184
"metadata": {},
219-
"outputs": [],
220-
"source": "# Get the schema - what columns/fields are in this dataset?\nschema = dataset.get_schema()\n\nprint(f\"Dataset: {schema['name']}\")\nprint(f\"Data Type: {schema['data_type']}\")\nprint(f\"\\nSplits: {[s['name'] for s in schema['splits']]}\")\nprint(f\"\\nFields:\")\nfor field in schema['fields']:\n print(f\" - {field['name']} ({field['role']}): {field['description'] or 'No description'}\")"
185+
"outputs": [
186+
{
187+
"name": "stdout",
188+
"output_type": "stream",
189+
"text": [
190+
"Dataset: foundry_g4mp2_solvation_v1.2\n",
191+
"Data Type: tabular\n",
192+
"\n",
193+
"Splits: ['train']\n",
194+
"\n",
195+
"Fields:\n",
196+
" - smiles_0 (input): Input SMILES string\n",
197+
" - smiles_1 (input): SMILES string after relaxation\n",
198+
" - inchi_0 (input): InChi after generating coordinates with CORINA\n",
199+
" - inchi_1 (input): InChi after relaxation\n",
200+
" - xyz (input): InChi after relaxation\n",
201+
" - atomic_charges (input): Atomic charges on each atom, as predicted from B3LYP\n",
202+
" - A (input): Rotational constant, A\n",
203+
" - B (input): Rotational constant, B\n",
204+
" - C (input): Rotational constant, C\n",
205+
" - inchi_1 (input): InChi after relaxation\n",
206+
" - n_electrons (input): Number of electrons\n",
207+
" - n_heavy_atoms (input): Number of non-hydrogen atoms\n",
208+
" - n_atom (input): Number of atoms in molecule\n",
209+
" - mu (input): Dipole moment\n",
210+
" - alpha (input): Isotropic polarizability\n",
211+
" - R2 (input): Electronic spatial extant\n",
212+
" - cv (input): Heat capacity at 298.15K\n",
213+
" - g4mp2_hf298 (target): G4MP2 Standard Enthalpy of Formation, 298K\n",
214+
" - bandgap (input): B3LYP Band gap energy\n",
215+
" - homo (input): B3LYP Energy of HOMO\n",
216+
" - lumo (input): B3LYP Energy of LUMO\n",
217+
" - zpe (input): B3LYP Zero point vibrational energy\n",
218+
" - u0 (input): B3LYP Internal energy at 0K\n",
219+
" - u (input): B3LYP Internal energy at 298.15K\n",
220+
" - h (input): B3LYP Enthalpy at 298.15K\n",
221+
" - u0_atom (input): B3LYP atomization energy at 0K\n",
222+
" - g (input): B3LYP Free energy at 298.15K\n",
223+
" - g4mp2_0k (target): G4MP2 Internal energy at 0K\n",
224+
" - g4mp2_energy (target): G4MP2 Internal energy at 298.15K\n",
225+
" - g4mp2_enthalpy (target): G4MP2 Enthalpy at 298.15K\n",
226+
" - g4mp2_free (target): G4MP2 Free eergy at 0K\n",
227+
" - g4mp2_atom (target): G4MP2 atomization energy at 0K\n",
228+
" - sol_acetone (target): Solvation energy, acetone\n",
229+
" - sol_acn (target): Solvation energy, acetonitrile\n",
230+
" - sol_dmso (target): Solvation energy, dimethyl sulfoxide\n",
231+
" - sol_ethanol (target): Solvation energy, ethanol\n",
232+
" - sol_water (target): Solvation energy, water\n"
233+
]
234+
}
235+
],
236+
"source": [
237+
"# Get the schema - what columns/fields are in this dataset?\n",
238+
"schema = dataset.get_schema()\n",
239+
"\n",
240+
"print(f\"Dataset: {schema['name']}\")\n",
241+
"print(f\"Data Type: {schema['data_type']}\")\n",
242+
"print(f\"\\nSplits: {[s['name'] for s in schema['splits']]}\")\n",
243+
"print(f\"\\nFields:\")\n",
244+
"for field in schema['fields']:\n",
245+
" print(f\" - {field['name']} ({field['role']}): {field['description'] or 'No description'}\")"
246+
]
221247
},
222248
{
223249
"cell_type": "markdown",
@@ -230,22 +256,14 @@
230256
},
231257
{
232258
"cell_type": "code",
233-
"execution_count": null,
259+
"execution_count": 6,
234260
"metadata": {},
235261
"outputs": [
236-
{
237-
"name": "stderr",
238-
"output_type": "stream",
239-
"text": [
240-
"Processing records: 100%|█████████████████████████████████| 1/1 [00:00<00:00, 3266.59it/s]\n",
241-
"Transferring data: 0%| | 0/1 [00:00<?, ?it/s]"
242-
]
243-
},
244262
{
245263
"name": "stdout",
246264
"output_type": "stream",
247265
"text": [
248-
"Error: GC_NOT_CONNECTED - globus connect offline\n"
266+
"Data keys: dict_keys(['train'])\n"
249267
]
250268
}
251269
],
@@ -259,9 +277,20 @@
259277
},
260278
{
261279
"cell_type": "code",
262-
"execution_count": null,
280+
"execution_count": 7,
263281
"metadata": {},
264-
"outputs": [],
282+
"outputs": [
283+
{
284+
"name": "stdout",
285+
"output_type": "stream",
286+
"text": [
287+
"Training data shape: <class 'tuple'>\n",
288+
"\n",
289+
"Inputs (X): <class 'pandas.core.frame.DataFrame'>\n",
290+
"Targets (y): <class 'pandas.core.frame.DataFrame'>\n"
291+
]
292+
}
293+
],
265294
"source": [
266295
"# For ML datasets, data is typically split into inputs (X) and targets (y)\n",
267296
"# Let's explore the training split\n",
@@ -287,9 +316,17 @@
287316
},
288317
{
289318
"cell_type": "code",
290-
"execution_count": null,
319+
"execution_count": 8,
291320
"metadata": {},
292-
"outputs": [],
321+
"outputs": [
322+
{
323+
"name": "stdout",
324+
"output_type": "stream",
325+
"text": [
326+
"Foundry works with PyTorch and TensorFlow out of the box!\n"
327+
]
328+
}
329+
],
293330
"source": [
294331
"# For PyTorch users:\n",
295332
"# torch_dataset = dataset.get_as_torch(split='train')\n",
@@ -314,9 +351,24 @@
314351
},
315352
{
316353
"cell_type": "code",
317-
"execution_count": null,
354+
"execution_count": 9,
318355
"metadata": {},
319-
"outputs": [],
356+
"outputs": [
357+
{
358+
"name": "stdout",
359+
"output_type": "stream",
360+
"text": [
361+
"@misc{https://doi.org/10.18126/jos5-wj65\n",
362+
"doi = {10.18126/jos5-wj65}\n",
363+
"url = {https://doi.org/10.18126/jos5-wj65}\n",
364+
"author = {Ward, Logan and Dandu, Naveen and Blaiszik, Ben and Narayanan, Badri and Assary, Rajeev S. and Redfern, Paul C. and Foster, Ian and Curtiss, Larry A.}\n",
365+
"title = {DFT Estimates of Solvation Energy in Multiple Solvents}\n",
366+
"keywords = {machine learning, foundry}\n",
367+
"publisher = {Materials Data Facility}\n",
368+
"year = {root=2022}}\n"
369+
]
370+
}
371+
],
320372
"source": [
321373
"# Get BibTeX citation\n",
322374
"citation = dataset.get_citation()\n",
@@ -368,4 +420,4 @@
368420
},
369421
"nbformat": 4,
370422
"nbformat_minor": 4
371-
}
423+
}

0 commit comments

Comments
 (0)