You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix DOI search and improve MDFClient query handling
MDFClient improvements:
- Add Globus Search index ID constants (MDF_INDEX_ID, MDF_TEST_INDEX_ID)
- Add match_source_names() method with automatic version suffix stripping
- Add _has_field_filters property for elegant advanced mode detection
- Use advanced=True automatically for DOI and source_name searches
(required for exact field matching in Globus Search)
- Add try/finally to ensure query state is always reset after search
Foundry search fix:
- Pass free-text query to Globus Search for server-side filtering
instead of fetching 10 results and filtering client-side
- This fixes searches like f.search("Computational Band Gaps") that
were failing when the target dataset wasn't in the first 10 results
Test additions:
- Add test_load_mp_band_gaps_dataset to verify DOI-based dataset loading
Re-rendered example notebooks with updated outputs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
"<h2>DFT Estimates of Solvation Energy in Multiple Solvents</h2>Ward, Logan; Dandu, Naveen; Blaiszik, Ben; Narayanan, Badri; Assary, Rajeev S.; Redfern, Paul C.; Foster, Ian; Curtiss, Larry A.<p>DOI: 10.18126/jos5-wj65</p><h3>Dataset</h3><table><tr><th>short_name</th><td>g4mp2_solvation</td></tr><tr><th>data_type</th><td>tabular</td></tr><tr><th>task_type</th><td><ul><li>supervised</li></ul></td></tr><tr><th>domain</th><td><ul><li>materials science</li><li>chemistry</li></ul></td></tr><tr><th>n_items</th><td>130258.0</td></tr><tr><th>splits</th><td><ul><li><table><tr><th>type</th><td>train</td></tr><tr><th>path</th><td>g4mp2_data.json</td></tr><tr><th>label</th><td>train</td></tr></table></li></ul></td></tr><tr><th>keys</th><td><table><tr><th>key</th><th>type</th><th>filter</th><th>description</th><th>units</th><th>classes</th></tr><tr><td><ul><li>smiles_0</li></ul></td><td>input</td><td></td><td>Input SMILES string</td><td></td><td></td></tr><tr><td><ul><li>smiles_1</li></ul></td><td>input</td><td></td><td>SMILES string after relaxation</td><td></td><td></td></tr><tr><td><ul><li>inchi_0</li></ul></td><td>input</td><td></td><td>InChi after generating coordinates with CORINA</td><td></td><td></td></tr><tr><td><ul><li>inchi_1</li></ul></td><td>input</td><td></td><td>InChi after relaxation</td><td></td><td></td></tr><tr><td><ul><li>xyz</li></ul></td><td>input</td><td></td><td>InChi after relaxation</td><td>XYZ coordinates after relaxation</td><td></td></tr><tr><td><ul><li>atomic_charges</li></ul></td><td>input</td><td></td><td>Atomic charges on each atom, as predicted from B3LYP</td><td></td><td></td></tr><tr><td><ul><li>A</li></ul></td><td>input</td><td></td><td>Rotational constant, A</td><td>GHz</td><td></td></tr><tr><td><ul><li>B</li></ul></td><td>input</td><td></td><td>Rotational constant, B</td><td>GHz</td><td></td></tr><tr><td><ul><li>C</li></ul></td><td>input</td><td></td><td>Rotational constant, C</td><td>GHz</td><td></td></tr><tr><td><ul><li>inchi_1</li></ul></td><td>input</td><td></td><td>InChi after relaxation</td><td></td><td></td></tr><tr><td><ul><li>n_electrons</li></ul></td><td>input</td><td></td><td>Number of electrons</td><td></td><td></td></tr><tr><td><ul><li>n_heavy_atoms</li></ul></td><td>input</td><td></td><td>Number of non-hydrogen atoms</td><td></td><td></td></tr><tr><td><ul><li>n_atom</li></ul></td><td>input</td><td></td><td>Number of atoms in molecule</td><td></td><td></td></tr><tr><td><ul><li>mu</li></ul></td><td>input</td><td></td><td>Dipole moment</td><td>D</td><td></td></tr><tr><td><ul><li>alpha</li></ul></td><td>input</td><td></td><td>Isotropic polarizability</td><td>a_0^3</td><td></td></tr><tr><td><ul><li>R2</li></ul></td><td>input</td><td></td><td>Electronic spatial extant</td><td>a_0^2</td><td></td></tr><tr><td><ul><li>cv</li></ul></td><td>input</td><td></td><td>Heat capacity at 298.15K</td><td>cal/mol-K</td><td></td></tr><tr><td><ul><li>g4mp2_hf298</li></ul></td><td>target</td><td></td><td>G4MP2 Standard Enthalpy of Formation, 298K</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>bandgap</li></ul></td><td>input</td><td></td><td>B3LYP Band gap energy</td><td>Ha</td><td></td></tr><tr><td><ul><li>homo</li></ul></td><td>input</td><td></td><td>B3LYP Energy of HOMO</td><td>Ha</td><td></td></tr><tr><td><ul><li>lumo</li></ul></td><td>input</td><td></td><td>B3LYP Energy of LUMO</td><td>Ha</td><td></td></tr><tr><td><ul><li>zpe</li></ul></td><td>input</td><td></td><td>B3LYP Zero point vibrational energy</td><td>Ha</td><td></td></tr><tr><td><ul><li>u0</li></ul></td><td>input</td><td></td><td>B3LYP Internal energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>u</li></ul></td><td>input</td><td></td><td>B3LYP Internal energy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>h</li></ul></td><td>input</td><td></td><td>B3LYP Enthalpy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>u0_atom</li></ul></td><td>input</td><td></td><td>B3LYP atomization energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g</li></ul></td><td>input</td><td></td><td>B3LYP Free energy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_0k</li></ul></td><td>target</td><td></td><td>G4MP2 Internal energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_energy</li></ul></td><td>target</td><td></td><td>G4MP2 Internal energy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_enthalpy</li></ul></td><td>target</td><td></td><td>G4MP2 Enthalpy at 298.15K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_free</li></ul></td><td>target</td><td></td><td>G4MP2 Free eergy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>g4mp2_atom</li></ul></td><td>target</td><td></td><td>G4MP2 atomization energy at 0K</td><td>Ha</td><td></td></tr><tr><td><ul><li>sol_acetone</li></ul></td><td>target</td><td></td><td>Solvation energy, acetone</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_acn</li></ul></td><td>target</td><td></td><td>Solvation energy, acetonitrile</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_dmso</li></ul></td><td>target</td><td></td><td>Solvation energy, dimethyl sulfoxide</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_ethanol</li></ul></td><td>target</td><td></td><td>Solvation energy, ethanol</td><td>kcal/mol</td><td></td></tr><tr><td><ul><li>sol_water</li></ul></td><td>target</td><td></td><td>Solvation energy, water</td><td>kcal/mol</td><td></td></tr></table></td></tr></table>"
188
154
],
189
155
"text/plain": [
190
-
"<foundry.foundry_dataset.FoundryDataset at 0x1342b8230>"
156
+
"<foundry.foundry_dataset.FoundryDataset at 0x140201070>"
191
157
]
192
158
},
193
159
"execution_count": 4,
@@ -214,10 +180,70 @@
214
180
},
215
181
{
216
182
"cell_type": "code",
217
-
"execution_count": null,
183
+
"execution_count": 5,
218
184
"metadata": {},
219
-
"outputs": [],
220
-
"source": "# Get the schema - what columns/fields are in this dataset?\nschema = dataset.get_schema()\n\nprint(f\"Dataset: {schema['name']}\")\nprint(f\"Data Type: {schema['data_type']}\")\nprint(f\"\\nSplits: {[s['name'] for s in schema['splits']]}\")\nprint(f\"\\nFields:\")\nfor field in schema['fields']:\n print(f\" - {field['name']} ({field['role']}): {field['description'] or 'No description'}\")"
185
+
"outputs": [
186
+
{
187
+
"name": "stdout",
188
+
"output_type": "stream",
189
+
"text": [
190
+
"Dataset: foundry_g4mp2_solvation_v1.2\n",
191
+
"Data Type: tabular\n",
192
+
"\n",
193
+
"Splits: ['train']\n",
194
+
"\n",
195
+
"Fields:\n",
196
+
" - smiles_0 (input): Input SMILES string\n",
197
+
" - smiles_1 (input): SMILES string after relaxation\n",
198
+
" - inchi_0 (input): InChi after generating coordinates with CORINA\n",
199
+
" - inchi_1 (input): InChi after relaxation\n",
200
+
" - xyz (input): InChi after relaxation\n",
201
+
" - atomic_charges (input): Atomic charges on each atom, as predicted from B3LYP\n",
202
+
" - A (input): Rotational constant, A\n",
203
+
" - B (input): Rotational constant, B\n",
204
+
" - C (input): Rotational constant, C\n",
205
+
" - inchi_1 (input): InChi after relaxation\n",
206
+
" - n_electrons (input): Number of electrons\n",
207
+
" - n_heavy_atoms (input): Number of non-hydrogen atoms\n",
208
+
" - n_atom (input): Number of atoms in molecule\n",
209
+
" - mu (input): Dipole moment\n",
210
+
" - alpha (input): Isotropic polarizability\n",
211
+
" - R2 (input): Electronic spatial extant\n",
212
+
" - cv (input): Heat capacity at 298.15K\n",
213
+
" - g4mp2_hf298 (target): G4MP2 Standard Enthalpy of Formation, 298K\n",
214
+
" - bandgap (input): B3LYP Band gap energy\n",
215
+
" - homo (input): B3LYP Energy of HOMO\n",
216
+
" - lumo (input): B3LYP Energy of LUMO\n",
217
+
" - zpe (input): B3LYP Zero point vibrational energy\n",
218
+
" - u0 (input): B3LYP Internal energy at 0K\n",
219
+
" - u (input): B3LYP Internal energy at 298.15K\n",
220
+
" - h (input): B3LYP Enthalpy at 298.15K\n",
221
+
" - u0_atom (input): B3LYP atomization energy at 0K\n",
222
+
" - g (input): B3LYP Free energy at 298.15K\n",
223
+
" - g4mp2_0k (target): G4MP2 Internal energy at 0K\n",
224
+
" - g4mp2_energy (target): G4MP2 Internal energy at 298.15K\n",
225
+
" - g4mp2_enthalpy (target): G4MP2 Enthalpy at 298.15K\n",
226
+
" - g4mp2_free (target): G4MP2 Free eergy at 0K\n",
227
+
" - g4mp2_atom (target): G4MP2 atomization energy at 0K\n",
"author = {Ward, Logan and Dandu, Naveen and Blaiszik, Ben and Narayanan, Badri and Assary, Rajeev S. and Redfern, Paul C. and Foster, Ian and Curtiss, Larry A.}\n",
365
+
"title = {DFT Estimates of Solvation Energy in Multiple Solvents}\n",
0 commit comments