Skip to content

Avoid MariaDB join explosion for disambiguated taxon QB results#7978

Open
acwhite211 wants to merge 3 commits intov7_12_0_3from
issue-7977
Open

Avoid MariaDB join explosion for disambiguated taxon QB results#7978
acwhite211 wants to merge 3 commits intov7_12_0_3from
issue-7977

Conversation

@acwhite211
Copy link
Copy Markdown
Member

@acwhite211 acwhite211 commented Apr 15, 2026

Fixes #7977

The change from #7509 fixed the QB disambiguation problem from #7436 by making taxon-related tree-rank fields resolve against the correct join path, but on some databases it also caused the generated SQL to exceed MariaDB's 61 table join limit.

The fix keeps the disambiguation behavior from #7436, but limits the SQL change to taxon tree rank fields. Instead of building a full parent-chain join in the main query for every taxon path, the main query now joins only to the starting taxon alias, such as Taxon, PreferredTaxon, or HostTaxon. Any requested taxon rank is then resolved in a correlated subquery rooted at that specific alias. This preserves correct disambiguation between taxon-related columns while avoiding the MariaDB “too many tables in a join” failure.

Query Comparison

old_query.sql:
The main query had one full taxon ancestry chain off determination.TaxonID, and PreferredTaxonID was only joined directly.
This kept the join count lower, but it could resolve disambiguated taxon fields against the wrong taxon path.

new_query.sql from the merged PR:
The main query built separate full ancestry chains for both TaxonID and PreferredTaxonID.
This fixed the disambiguation problem, but it increased the outer join graph from 50 joins / 18 taxon aliases to 66 joins / 34 taxon aliases, which is what triggered the MariaDB limit on some databases.

Initial changes in this PR:
The main query still keeps separate starting aliases for the taxon paths, so the disambiguation fix remains.
The difference is that the parent-chain walk for taxon ranks is moved into correlated subqueries, so those extra taxon ancestry chains are no longer part of the main query’s outer join graph.
The query stays correct, but the expensive taxon tree expansion is split into separate SELECTs instead of one oversized join.

Final changes in this PR::
The final version keeps the same disambiguated starting aliases, but resolves each requested taxon rank with a scalar correlated subquery that walks only the alias's own parent chain. So, instead of building a reusable taxon rank lookup for the whole tree, each taxon rank column now follows just the current Taxon or PreferredTaxon lineage and returns the matching rank value from that lineage.

This keeps the #7436 disambiguation fix, avoids the oversized outer-join graph that caused the MariaDB 61 table failure, and also avoids the slower full tree lookup plan from the initial draft of this PR. On the database that reproduced #7977, the problematic label query now completes successfully instead of timing out.

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list

Testing instructions

  • Run the label generating query that caused a MariaDB 'Too many table joins' error.
  • See that the label now gets generated without any errors in the logs.

Copy link
Copy Markdown
Member

@grantfitzsimmons grantfitzsimmons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • See that the label now gets generated without any errors in the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📋Back Log

Development

Successfully merging this pull request may close these issues.

2 participants