Striped Unifrac C++ implementation#829
Open
jepasan wants to merge 50 commits into
Open
Conversation
The C++ code works for rooted trees, but not unrooted ones. Possibly a bug in my implementation, more testing needed.
Some datasets still produce divergent values. Likely a bug in my implementation.
Bringing the assay into C++ was using rowTree tip labels for the observation ids, causing nonsense results when they were in different order from the actual rownames. Also added a check for cladewise tree ordering.
However, speed is slower than current implementation
Unweighted is now significantly faster and less memory intensive than the existing implementation
Merge remote-tracking branch 'upstream/devel' into fastunifrac # Conflicts: # DESCRIPTION # NAMESPACE # R/RcppExports.R # R/addAlpha.R # src/.gitignore # src/RcppExports.cpp # src/assay.cpp # src/assay.h # src/propmap.cpp # src/propmap.h # src/tree.h
Member
|
Excellent. Do you have the code available to demonstrate the ecodive comparisons (or could we include it in the unit tests)? This might be helpful for later. |
Contributor
Author
|
The current tests in test-5Unifrac.R already compare the results against ecodive, so that part should already be covered. The datasets used there aren't big enough for there to be any large differences in runtime, but benchmarking should also be possible by taking those comparisons and timing them. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
As discussed in #756, this code adds a C++ implementation of the Striped Unifrac algorithm (https://www.nature.com/articles/s41592-018-0187-8, adapted from the supplementary code in the article), extending the code of the previously added C++ Faith index implementation (#522).
Results should be identical to those provided by ecodive for both weighted and unweighted cases, and at least based on my own testing on a normal laptop somewhat faster and less memory intensive. Further improvements could be likely be made by adding support for multithreading and more optimization. It would also be fairly easy to adapt the code for other forms of unifrac such as generalized unifrac if desired.