Skip to content

Striped Unifrac C++ implementation#829

Open
jepasan wants to merge 50 commits into
microbiome:develfrom
jepasan:fastunifrac
Open

Striped Unifrac C++ implementation#829
jepasan wants to merge 50 commits into
microbiome:develfrom
jepasan:fastunifrac

Conversation

@jepasan
Copy link
Copy Markdown
Contributor

@jepasan jepasan commented May 25, 2026

As discussed in #756, this code adds a C++ implementation of the Striped Unifrac algorithm (https://www.nature.com/articles/s41592-018-0187-8, adapted from the supplementary code in the article), extending the code of the previously added C++ Faith index implementation (#522).
Results should be identical to those provided by ecodive for both weighted and unweighted cases, and at least based on my own testing on a normal laptop somewhat faster and less memory intensive. Further improvements could be likely be made by adding support for multithreading and more optimization. It would also be fairly easy to adapt the code for other forms of unifrac such as generalized unifrac if desired.

jepasan and others added 30 commits February 21, 2025 11:25
The C++ code works for rooted trees, but not unrooted ones. Possibly a bug in my implementation, more testing needed.
Some datasets still produce divergent values. Likely a bug in my implementation.
Bringing the assay into C++ was using rowTree tip labels for the observation ids, causing nonsense results when they were in different order from the actual rownames.

Also added a check for cladewise tree ordering.
@antagomir
Copy link
Copy Markdown
Member

Excellent.

Do you have the code available to demonstrate the ecodive comparisons (or could we include it in the unit tests)? This might be helpful for later.

@jepasan
Copy link
Copy Markdown
Contributor Author

jepasan commented May 26, 2026

The current tests in test-5Unifrac.R already compare the results against ecodive, so that part should already be covered. The datasets used there aren't big enough for there to be any large differences in runtime, but benchmarking should also be possible by taking those comparisons and timing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants