Avoid MariaDB join explosion for disambiguated taxon QB results#7978
Open
acwhite211 wants to merge 3 commits intov7_12_0_3from
Open
Avoid MariaDB join explosion for disambiguated taxon QB results#7978acwhite211 wants to merge 3 commits intov7_12_0_3from
acwhite211 wants to merge 3 commits intov7_12_0_3from
Conversation
grantfitzsimmons
approved these changes
Apr 15, 2026
Member
grantfitzsimmons
left a comment
There was a problem hiding this comment.
- See that the label now gets generated without any errors in the logs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #7977
The change from #7509 fixed the QB disambiguation problem from #7436 by making taxon-related tree-rank fields resolve against the correct join path, but on some databases it also caused the generated SQL to exceed MariaDB's 61 table join limit.
The fix keeps the disambiguation behavior from #7436, but limits the SQL change to taxon tree rank fields. Instead of building a full parent-chain join in the main query for every taxon path, the main query now joins only to the starting taxon alias, such as Taxon, PreferredTaxon, or HostTaxon. Any requested taxon rank is then resolved in a correlated subquery rooted at that specific alias. This preserves correct disambiguation between taxon-related columns while avoiding the MariaDB “too many tables in a join” failure.
Query Comparison
old_query.sql:
The main query had one full taxon ancestry chain off determination.TaxonID, and PreferredTaxonID was only joined directly.
This kept the join count lower, but it could resolve disambiguated taxon fields against the wrong taxon path.
new_query.sql from the merged PR:
The main query built separate full ancestry chains for both TaxonID and PreferredTaxonID.
This fixed the disambiguation problem, but it increased the outer join graph from 50 joins / 18 taxon aliases to 66 joins / 34 taxon aliases, which is what triggered the MariaDB limit on some databases.
Initial changes in this PR:
The main query still keeps separate starting aliases for the taxon paths, so the disambiguation fix remains.
The difference is that the parent-chain walk for taxon ranks is moved into correlated subqueries, so those extra taxon ancestry chains are no longer part of the main query’s outer join graph.
The query stays correct, but the expensive taxon tree expansion is split into separate SELECTs instead of one oversized join.
Final changes in this PR::
The final version keeps the same disambiguated starting aliases, but resolves each requested taxon rank with a scalar correlated subquery that walks only the alias's own parent chain. So, instead of building a reusable taxon rank lookup for the whole tree, each taxon rank column now follows just the current Taxon or PreferredTaxon lineage and returns the matching rank value from that lineage.
This keeps the #7436 disambiguation fix, avoids the oversized outer-join graph that caused the MariaDB 61 table failure, and also avoids the slower full tree lookup plan from the initial draft of this PR. On the database that reproduced #7977, the problematic label query now completes successfully instead of timing out.
Checklist
self-explanatory (or properly documented)
Testing instructions