Skip to content

429 Plots for PAE/pLDDT based CL validation#435

Open
jorisfu wants to merge 37 commits into
crosslinkingfrom
429-add-plots-for-paeplddt-based-cl-validation
Open

429 Plots for PAE/pLDDT based CL validation#435
jorisfu wants to merge 37 commits into
crosslinkingfrom
429-add-plots-for-paeplddt-based-cl-validation

Conversation

@jorisfu
Copy link
Copy Markdown
Collaborator

@jorisfu jorisfu commented May 22, 2026

Description

fixes #429

Adds new plots for the PAE/pLDDT based CL validation methods. The plots are scatter plots that display the absolute deviation of the CL length compared to the predicted distance on the x-axis and the used PAE or avg. pLDDT on the y-axis.

Changes

crosslinking_validation.py, added the plots.

Testing

Use a typical CL validation workflow and test out the plots and see if they make sense both for monomers and for multimers. Make sure to test all validation strategies.

PR checklist

Development

  • If necessary, I have updated the documentation (README, docstrings, etc.)
  • If necessary, I have created / updated tests.

Mergeability

  • crosslinking-branch has been merged into local branch to resolve conflicts
  • The tests and linter have passed AFTER local merge
  • The backend code has been formatted with black

Code review

  • I have self-reviewed my code.
  • At least one other developer reviewed and approved the changes

@jorisfu jorisfu linked an issue May 22, 2026 that may be closed by this pull request
1 task
@jorisfu jorisfu changed the base branch from main to crosslinking May 22, 2026 13:52
@jorisfu jorisfu self-assigned this May 27, 2026
@jorisfu jorisfu marked this pull request as ready for review May 28, 2026 09:22
@jorisfu jorisfu requested review from NeleRiediger and tE3m May 28, 2026 09:23
@jorisfu jorisfu removed the blocked label Jun 1, 2026
@jorisfu jorisfu marked this pull request as draft June 3, 2026 07:45
@jorisfu jorisfu marked this pull request as ready for review June 3, 2026 07:45
Copy link
Copy Markdown
Collaborator

@NeleRiediger NeleRiediger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing went fine overall. I just noticed that for different validation methods of the same protein the number of crosslinks shown in the plots doesn't always match. Also the number of crosslinks shown in the plots doesn't always match the amount shown in the visualization. (I tested with monomer O43242) This is probably just due to overlapping or outliers, but I wanted to make sure there is no bug in it.

(Also I believe this is not in the scope of this PR, but I noticed that when the manual bounds method is chosen as a default when the step was newly created, the input fields for the bounds aren't shown. They only appear when I select the manual bounds method in the dropdown again.)

I made some comments in the code, but it generally looked good too.

Comment thread backend/protzilla/data_analysis/crosslinking_validation.py Outdated
max_dist_delta = max(cl_results_df["distance_delta"])

xmin = -1
if np.log10(xmin) > min_dist_delta:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here, wouldn't this always be NaN?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Messed that up, should be 10**xmin. Fixed now

Comment thread backend/protzilla/data_analysis/crosslinking_validation.py Outdated
max_dist_delta = max(cl_results_df["distance_delta"])

xmin = -1
if np.log10(xmin) > min_dist_delta:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@jorisfu
Copy link
Copy Markdown
Collaborator Author

jorisfu commented Jun 5, 2026

Also the number of crosslinks shown in the plots doesn't always match the amount shown in the visualization. (I tested with monomer O43242) This is probably just due to overlapping or outliers, but I wanted to make sure there is no bug in it.

I believe this is due to overlap, as I don't hide anything. I count 7 CLs both in the plot and in the visu and one visual overlap for a CL with a Min. PAE of 8. Would be cool if you could double check that again to ensure this isn't a frontend bug on your side that we need to look at

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  backend/protzilla/data_analysis
  crosslinking_validation.py 1094-1187, 1209-1319, 1354-1369, 1411-1426
Project Total  

This report was generated by python-coverage-comment-action

@jorisfu jorisfu requested a review from NeleRiediger June 5, 2026 09:03
@NeleRiediger
Copy link
Copy Markdown
Collaborator

Also the number of crosslinks shown in the plots doesn't always match the amount shown in the visualization. (I tested with monomer O43242) This is probably just due to overlapping or outliers, but I wanted to make sure there is no bug in it.

I believe this is due to overlap, as I don't hide anything. I count 7 CLs both in the plot and in the visu and one visual overlap for a CL with a Min. PAE of 8. Would be cool if you could double check that again to ensure this isn't a frontend bug on your side that we need to look at

Yes, that checks, seems to work fine now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Plots for PAE/pLDDT based CL validation

3 participants