Skip to content

add dynamic RefCDS definition for dNdScv#444

Open
FerriolCalvet wants to merge 16 commits intodevfrom
feat/dynamic-refcds
Open

add dynamic RefCDS definition for dNdScv#444
FerriolCalvet wants to merge 16 commits intodevfrom
feat/dynamic-refcds

Conversation

@FerriolCalvet
Copy link
Copy Markdown
Member

No description provided.

- select the correct gene column
- remove '-' as a gene

missing:
- check if there is an additional ',' at the end
- new container for query independent from dNdS
Missing:
- container dependencies should be installed but are not found
- Merge remote-tracking branch 'origin/custom-refcds' into feat/dynamic-refcds
- add script from intogen repo
- add automatic querying to Ensembl biomart
- filter regions based on consensus panel
- output dNdS: cv, loc and global
- output warning of false splicesites
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the DEEPCSA dNdScv workflow to build a panel-specific RefCDS dynamically (via BioMart + panel reformatting + dndscv::buildref) and adapts dNdScv output handling/publishing accordingly.

Changes:

  • Extend the DNDS subworkflow to query Ensembl BioMart, filter/align panel regions to codons, and build a custom RefCDS_custom.rda from the provided reference FASTA.
  • Split RUN_DNDS outputs into separate CV / global / local TSVs and update the R runner script to write those files using an --outputprefix.
  • Add publishing rules for dNdScv outputs and the filtered biomart/splice-site artifacts.

Reviewed changes

Copilot reviewed 8 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
workflows/deepcsa.nf Updates the DNDS(...) invocation to pass bed/panel/fasta needed for dynamic RefCDS generation.
subworkflows/local/dnds/main.nf Adds BioMart → filter → RefCDS build steps and aggregates per-sample outputs into combined TSVs.
modules/local/dnds/run/main.nf Changes RUN_DNDS to emit three distinct result types and uses --outputprefix.
modules/local/dnds/querybiomart/main.nf New process to extract gene list from panel and query Ensembl BioMart.
modules/local/dnds/filterbiomart/main.nf New process to codon-align/filter biomart CDS rows to the panel and produce splice-site classification.
modules/local/dnds/buildref/main.nf New process to call dndscv::buildref() and create RefCDS_custom.rda.
conf/results_outputs.config Adds publishDir routing for dNdScv outputs and FilterBioMart artifacts.
conf/modules.config Adjusts SUBSET_DNDS column selection (CHROM vs CHROM_ensembl).
bin/dNdScv_panel_prep.py New script to intersect panel BED with CDS exons, enforce codon alignment, and report real vs artificial splice junctions.
bin/dNdS_run.R Switches from --outputfile to --outputprefix and writes .cv/.globaldnds/.loc TSVs.
bin/biomart_query.R New helper script to submit BioMart XML queries and write raw CDS exon rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread subworkflows/local/dnds/main.nf Outdated
Comment thread modules/local/dnds/run/main.nf
Comment thread modules/local/dnds/querybiomart/main.nf Outdated
Comment thread modules/local/dnds/filterbiomart/main.nf Outdated
Comment thread modules/local/dnds/buildref/main.nf Outdated
Comment thread subworkflows/local/dnds/main.nf
Comment thread workflows/deepcsa.nf
Comment thread modules/local/dnds/buildref/main.nf
@FerriolCalvet FerriolCalvet linked an issue Apr 11, 2026 that may be closed by this pull request
@FerriolCalvet
Copy link
Copy Markdown
Member Author

we should make sure that the transcripts selected in our consensus panel and the ones downloaded from the Ensembl Biomart query are the same otherwise this could lead to several artificial splice sites that are solvable

@FerriolCalvet FerriolCalvet deleted the branch dev April 11, 2026 15:40
@FerriolCalvet FerriolCalvet reopened this Apr 11, 2026
Copy link
Copy Markdown
Member

@FedericaBrando FedericaBrando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, I think I will steal the nextflow process to implement the build of the custom RefCDS in IntOGen ahah

I added some comment, regarding biomart I would not reinvent the wheel, the query the way it is implemented now is a bit fragile, I would use ensembl packages, this is good for future development (raw query is harder to debug, understand, compared to a documented package) and for robustness (better error handling).

Another nitpick is about naming the process "FILTER_BIOMART" I think it doesn't fully explain what the process is doing, I would use the work ADAPT or something that let the user understand that the step is required for the next one, not so much linked to the previous (not sure if I am clear - I can extend the explanation if needed).

Anyway, overall looks good! I will leave the review as comment, up to you if you want to address the above mentioned comment or merge! Great job 🚀

Comment thread bin/biomart_query.R Outdated
Comment thread modules/local/dnds/filterbiomart/main.nf Outdated
Comment thread modules/local/dnds/adaptpanelrefcds/main.nf
@FerriolCalvet
Copy link
Copy Markdown
Member Author

thanks for the comments Fede! I will address them and then merge it since it is currently not fully functional for bigger panels, and passing this problem onto Ensembl functions sounds better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build RefCDS object. maybe from consensus exons panel?

3 participants