Conversation
- select the correct gene column - remove '-' as a gene missing: - check if there is an additional ',' at the end
- new container for query independent from dNdS Missing: - container dependencies should be installed but are not found
- Merge remote-tracking branch 'origin/custom-refcds' into feat/dynamic-refcds - add script from intogen repo
- add automatic querying to Ensembl biomart - filter regions based on consensus panel - output dNdS: cv, loc and global - output warning of false splicesites
There was a problem hiding this comment.
Pull request overview
This PR updates the DEEPCSA dNdScv workflow to build a panel-specific RefCDS dynamically (via BioMart + panel reformatting + dndscv::buildref) and adapts dNdScv output handling/publishing accordingly.
Changes:
- Extend the
DNDSsubworkflow to query Ensembl BioMart, filter/align panel regions to codons, and build a customRefCDS_custom.rdafrom the provided reference FASTA. - Split
RUN_DNDSoutputs into separate CV / global / local TSVs and update the R runner script to write those files using an--outputprefix. - Add publishing rules for dNdScv outputs and the filtered biomart/splice-site artifacts.
Reviewed changes
Copilot reviewed 8 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/deepcsa.nf | Updates the DNDS(...) invocation to pass bed/panel/fasta needed for dynamic RefCDS generation. |
| subworkflows/local/dnds/main.nf | Adds BioMart → filter → RefCDS build steps and aggregates per-sample outputs into combined TSVs. |
| modules/local/dnds/run/main.nf | Changes RUN_DNDS to emit three distinct result types and uses --outputprefix. |
| modules/local/dnds/querybiomart/main.nf | New process to extract gene list from panel and query Ensembl BioMart. |
| modules/local/dnds/filterbiomart/main.nf | New process to codon-align/filter biomart CDS rows to the panel and produce splice-site classification. |
| modules/local/dnds/buildref/main.nf | New process to call dndscv::buildref() and create RefCDS_custom.rda. |
| conf/results_outputs.config | Adds publishDir routing for dNdScv outputs and FilterBioMart artifacts. |
| conf/modules.config | Adjusts SUBSET_DNDS column selection (CHROM vs CHROM_ensembl). |
| bin/dNdScv_panel_prep.py | New script to intersect panel BED with CDS exons, enforce codon alignment, and report real vs artificial splice junctions. |
| bin/dNdS_run.R | Switches from --outputfile to --outputprefix and writes .cv/.globaldnds/.loc TSVs. |
| bin/biomart_query.R | New helper script to submit BioMart XML queries and write raw CDS exon rows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
we should make sure that the transcripts selected in our consensus panel and the ones downloaded from the Ensembl Biomart query are the same otherwise this could lead to several artificial splice sites that are solvable |
-not tested
FedericaBrando
left a comment
There was a problem hiding this comment.
Overall it looks good, I think I will steal the nextflow process to implement the build of the custom RefCDS in IntOGen ahah
I added some comment, regarding biomart I would not reinvent the wheel, the query the way it is implemented now is a bit fragile, I would use ensembl packages, this is good for future development (raw query is harder to debug, understand, compared to a documented package) and for robustness (better error handling).
Another nitpick is about naming the process "FILTER_BIOMART" I think it doesn't fully explain what the process is doing, I would use the work ADAPT or something that let the user understand that the step is required for the next one, not so much linked to the previous (not sure if I am clear - I can extend the explanation if needed).
Anyway, overall looks good! I will leave the review as comment, up to you if you want to address the above mentioned comment or merge! Great job 🚀
|
thanks for the comments Fede! I will address them and then merge it since it is currently not fully functional for bigger panels, and passing this problem onto Ensembl functions sounds better |
No description provided.