Skip to content

feat: Add functions for getting information from Structure#187

Open
cadenmyers13 wants to merge 21 commits intodiffpy:mainfrom
cadenmyers13:ase-adapter
Open

feat: Add functions for getting information from Structure#187
cadenmyers13 wants to merge 21 commits intodiffpy:mainfrom
cadenmyers13:ase-adapter

Conversation

@cadenmyers13
Copy link
Copy Markdown
Contributor

@cadenmyers13 cadenmyers13 commented Mar 24, 2026

See news file for all the methods that are added.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.21%. Comparing base (c5659e9) to head (65d2729).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #187      +/-   ##
==========================================
+ Coverage   99.20%   99.21%   +0.01%     
==========================================
  Files          15       15              
  Lines        2507     2554      +47     
==========================================
+ Hits         2487     2534      +47     
  Misses         20       20              
Files with missing lines Coverage Δ
tests/test_structure.py 99.82% <100.00%> (+0.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cadenmyers13
Copy link
Copy Markdown
Contributor Author

@sbillinge ready for review. ase to diffpy.structure adapter.

@sbillinge
Copy link
Copy Markdown
Contributor

Let's chat about this. It seems that adding ase as a dependency is a big step and we need to understand better the use-cases. Let's start with use-cases, then we can work on design and tests. Then we can code. Especially when we have AI to write the code, this process is the most important because the code-writing is the lowest cost now and the biggest risk is to introduce bloat and entropy because the AI doesn't care too much about that it seems like.

@cadenmyers13
Copy link
Copy Markdown
Contributor Author

@sbillinge Here's the UCs i thought of and is what guided the code on this PR. Only option 2 is included here. Option 1 would be on a different PR. Both 1a and 1b would be things that would need to be implemented in ase, but i included them here for completion:

  1. User wants to convert diffpy --> ase
    a. User calls a method in diffpy.structure.Structure and passes diffpy obj to convert diffpy --> ASE, or
    b. User calls a method in ASE.Atoms and passes diffpy obj to convert a diffpy --> ASE

  2. User want to convert ase --> diffpy
    a. User calls a method in diffpy.structure.Structure and passes ase obj to convert diffpy --> ASE, or
    b. User calls a method in ASE.Atoms and passes ase obj to convert a diffpy --> ASE

Any other use cases you could think of?

Also, we might be able to get away with this without adding it as a dependency. The only use for it is an isinstance check that raises an error, but we could probably remove this and the user would get an error downstream generated by something else. Presumably, if you're trying to convert a diffpy object to ase you already have ase installed in your env. Idk how removing this will go over in testing though. We could just add ase to tests.txt, right?

@sbillinge
Copy link
Copy Markdown
Contributor

The main question is serialization. I think that if it is possible to serialize ASE objects we can read them without an ASE dependency. We can also have code to write them.

Your UCs are good for some kind of high throughout pipeline where ASE is generating large numbers of objects that are being passed to Diffpy and serialization would be a serious performance bottleneck. However I do see the use and convenience. It occured to me that we could make an ASE pack so we don't pick up a new dependency in the core.

@sbillinge
Copy link
Copy Markdown
Contributor

I think that thinking of actual UCs could be good. The one highest on my list would be to take an MD simulation and Compute the PDF. Another would be to support something like Soham's cluster mining. We should maybe check how he did it as he may already have some code. I would guess that the code is on gitlab

@sbillinge
Copy link
Copy Markdown
Contributor

For these UCs do you know if it is Atoms objects or Some other kind of ASE objects that is passed?

@cadenmyers13 cadenmyers13 changed the title feat: Add structure adapter to convert ASE.Atom objects to diffpy.structure.Structure objects and additional useful methods feat: Add functions for getting information from Structure Apr 2, 2026
@cadenmyers13
Copy link
Copy Markdown
Contributor Author

@sbillinge I moved the discussion from here to the associated issue. Ive changed this PR to only include the function for getting structural information

Copy link
Copy Markdown
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good. Please see comments.

# End of class TestStructure


def test_get_chemical_symbols(datafile):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the case where the symbols are in there as ions? In that case I guess the desired behavior is that this returns the list without cleaning the ionic charge, right?

Then there may be a different function that cleans them.

In the future, we may want to handle ions differently than non ions, so (and i am not sure what the actual behavior is because I don't see a test for it) it would be better if this getter returned them uncleaned and the cleaning step was separate for when we want to clean (i.e., atm). Does it make sense?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbillinge Yes makes sense. all atoms get passed through the atom_bare_symbol function which strips all the symbols and only returns species. The challenge with supporting ions is sometimes the atom passed through is a bare symbol, ion, or something else. For example, the 12-C input for this test,

@pytest.mark.parametrize(
    "symbol, expected",
    [
        ("Cl-", "Cl"),
        ("Ca2+", "Ca"),
        ("12-C", "C"),
    ],
)
def test_atom_bare_symbol(symbol, expected):
    actual = atom_bare_symbol(symbol)
    assert actual == expected

I think its better to put this on a separate PR/issue. I'll make that issue


@pytest.mark.parametrize(
"return_array",
[ # case: user wants ADPs as an array
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I like these cases

else:
expected_displacement = {
# Iodine
"I_8_11": np.float64(0.025),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your test seems to be requiring np.float64 types. Is this behavior you want? It seems oddly specific. Remember, tests are supposed to test desired ``behavior". Is it desired behavior that we want it to return float64?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably not desired to have np.float64. I just incorporated this because this was in the output. Ive removed it.

"return_array",
[ # case: user wants isotropic displacement parameters as an array
# expected: a 1D array of shape (num_atoms,) with the Uiso values
True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a slightly odd and non standard use of parametrize. We normally give ([inputs], [outputs]) which is more extensible and more readable by a human, and then the test itself is very short and has little or no logic in it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I updated it. I still included the logic because the assert statements vary slightly as == isnt supported for numpy arrays

assert np.allclose(actual_isotropic_displacement, expected_isotropic_displacement)
else:
expected_isotropic_displacement = {
"Pb_1_Uiso": np.float64(0.0225566),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above.

@cadenmyers13
Copy link
Copy Markdown
Contributor Author

@sbillinge ready for review. responded to comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants