Skip to content

When pinning ancestor, treat it as an outgroup#1945

Open
glennhickey wants to merge 2 commits into
masterfrom
includeroot-as-outgroup
Open

When pinning ancestor, treat it as an outgroup#1945
glennhickey wants to merge 2 commits into
masterfrom
includeroot-as-outgroup

Conversation

@glennhickey

Copy link
Copy Markdown
Collaborator

This keeps the pinned ancestor (via --includeRoot) out of the chaining a bit which should help increase coverage in some cases.

Also adds cactus-hal2seqfile which is a convenience tool for helping update existing alignments (will develop this a little more in future PRs)

glennhickey and others added 2 commits June 23, 2026 11:31
… option)

When the root's sequence is supplied -- cactus-blast/cactus-align --includeRoot, or the integrated cactus when an ancestor sequence is given in the seqfile -- cactus's pairwise chaining previously treated that pinned root as an extra ingroup. Ingroups that align better to each other than to the (often deeper) root then grab each other as their primary alignments and starve the root, collapsing its coverage to its descendants -- the bridge connecting a re-aligned subclade to the rest of the alignment after halReplaceGenome.

Treat the pinned root as an outgroup instead, via a new <blast includeRootAsOutgroup> config option that defaults to on. This changes the default --includeRoot behaviour everywhere make_paf_alignments sees a supplied root (cactus-blast and the integrated cactus progressive path alike); it is blast/PAF-side only, so the consolidated/HAL output is unchanged in structure. To restore the old ingroup behaviour, pass a custom --configFile with includeRootAsOutgroup="0".

Verified on the evolver (human,mouse,rat)Anc1 multifurcation: with the default config, --includeRoot alone gives root->mouse coverage 464k (73%); with includeRootAsOutgroup="0" it falls to 125k (20%), and mafComparator vs the ground-truth alignment improved both precision and recall.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ubtree

For a given --root ancestor (default: the HAL's tree root), export the ancestor and every genome below it as bgzipped per-genome FASTAs (one parallel Toil hal2fasta|bgzip job each) and write a matching seqfile.  The ancestor's own sequence is included, so the output is ready to re-align and pin a subclade with cactus-blast/cactus-align --includeRoot.

The HAL is symlinked into the jobstore and read by each per-genome job via symlink, and each job's disk is sized to its genome rather than the whole HAL, so this scales to large HALs without copying them once per genome.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant