Skip to content

Commit ea9d4e3

Browse files
Merge pull request #440 from bbglab/dev
release v1.1.0
2 parents e7ace44 + d1de688 commit ea9d4e3

219 files changed

Lines changed: 13324 additions & 4591 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,5 @@ ste_notes.txt
1313
assets/HDP_files*
1414
scratch/
1515
scratchhhh/
16+
tests/test_data/all_samples.somatic.mutations.maf
17+
tests/test_data/all_samples_indv.depths.tsv.gz

README.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ First, prepare a samplesheet with your input data that looks as follows:
1616

1717
```csv
1818
sample,vcf,bam
19-
sample1,sample1.high.filtered.vcf,sample1.sorted.bam
20-
sample2,sample2.high.filtered.vcf,sample2.sorted.bam
19+
sample1,sample1.filtered.vcf,sample1.sorted.bam
20+
sample2,sample2.filtered.vcf,sample2.sorted.bam
2121
```
2222

2323
Each row represents a single sample with a single-sample VCF containing the mutations called in that sample and the BAM file that was used for getting those variant calls. The mutations will be obtained from the VCF and the BAM file will be used for computing the sequencing depth at each position and using this for the downstream analysis.
@@ -28,8 +28,6 @@ Each row represents a single sample with a single-sample VCF containing the muta
2828

2929
After making sure that these files are ready, you can now run the pipeline using:
3030

31-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
32-
3331
```bash
3432
git clone https://github.com/bbglab/deepCSA.git
3533
cd deepCSA
@@ -74,4 +72,16 @@ We are working to provide the biggest possible detail on the [usage](docs/usage.
7472

7573
## Publications
7674

77-
> [Sex and smoking bias in the selection of somatic mutations in human bladder](https://www.nature.com/articles/s41586-025-09521-x)
75+
> **Sex and smoking bias in the selection of somatic mutations in human bladder**
76+
>
77+
> Ferriol Calvet*, Raquel Blanco Martinez-Illescas*, Ferran Muiños, Maria Tretiakova, Elena S. Latorre-Esteves, Jeanne Fredrickson, Maria Andrianova, Stefano Pellegrini, Axel Rosendahl Huber, Joan Enric Ramis-Zaldivar, Shuyi Charlotte An, Elana Thieme, Brendan F. Kohrn, Miguel L. Grau, Abel Gonzalez-Perez, Nuria Lopez-Bigas & Rosa Ana Risques
78+
>
79+
>_Nature_ (2025) doi:[10.1038/s41586-025-09521-x](https://doi.org/10.1038/s41586-025-09521-x)
80+
>
81+
> *these authors contributed equally and the order was decided randomly
82+
83+
> **DeepClone, an end-to-end protocol to study somatic mutagenesis and selection at high resolution**
84+
>
85+
> Ferriol Calvet, Morena Pinheiro-Santin, Erika Lopez, Raquel Blanco Martinez-Illescas, Núria Samper, Miguel L. Grau, Ferran Muiños, Rocío Chamorro González, Maria Andrianova, Federica Brando, Stefano Pellegrini, Marta Huertas, Elisabet Figuerola-Bou, Coohleen Coombes, Brendan F. Kohrn, Jeanne Fredrickson, Rosa Ana Risques, Nuria Lopez-Bigas, Abel Gonzalez-Perez
86+
>
87+
> _protocols.io_ (2026) doi:[10.17504/protocols.io.dm6gp1jodgzp/v2](https://dx.doi.org/10.17504/protocols.io.dm6gp1jodgzp/v2)
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
general:
2+
correct_pvals: true
3+
elements: null
4+
handle_na: ignore
5+
model: linear
6+
multi: true
7+
output_dir: ./
8+
predictor_random_effect: null
9+
predictors: null
10+
predictors_file: null
11+
predictors_intercept_0: null
12+
predictors_multi_force: null
13+
sample_column: null
14+
samples: null
15+
significance_threshold: 0.2
16+
metrics:
17+
metric_1:
18+
adjust: false
19+
elements_total_by: included
20+
file: all_mutdensities.tsv
21+
metric_name: mutdensity
22+
muttype:
23+
- snv
24+
- snv_indel
25+
region:
26+
- protein_affecting
27+
- non_protein_affecting
28+
samples_total_by: included
29+
plot:
30+
predictors_colors: null
31+
predictors_names: null
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
general:
2+
correct_pvals: true
3+
elements: null
4+
handle_na: ignore
5+
model: linear
6+
multi: true
7+
output_dir: ./
8+
predictor_random_effect: null
9+
predictors: null
10+
predictors_file: null
11+
predictors_intercept_0: null
12+
predictors_multi_force: null
13+
sample_column: null
14+
samples: null
15+
significance_threshold: 0.2
16+
metrics:
17+
metric_1:
18+
elements_total_by: included
19+
file: all_omegas.tsv
20+
global_loc: false
21+
impact:
22+
- missense
23+
- truncating
24+
- nonsynonymous_splice
25+
metric_name: omega
26+
multi: false
27+
samples_total_by: included
28+
significance_threshold: 1
29+
plot:
30+
predictors_colors: null
31+
predictors_names: null
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
general:
2+
correct_pvals: true
3+
elements: null
4+
handle_na: ignore
5+
model: linear
6+
multi: true
7+
output_dir: ./
8+
predictor_random_effect: null
9+
predictors: null
10+
predictors_file: null
11+
predictors_intercept_0: null
12+
predictors_multi_force: null
13+
sample_column: null
14+
samples: null
15+
significance_threshold: 0.2
16+
metrics:
17+
metric_1:
18+
elements_total_by: included
19+
file: all_omegas_global_loc.tsv
20+
global_loc: true
21+
impact:
22+
- missense
23+
- truncating
24+
- nonsynonymous_splice
25+
metric_name: omega
26+
multi: false
27+
samples_total_by: included
28+
significance_threshold: 1
29+
plot:
30+
predictors_colors: null
31+
predictors_names: null

assets/schema_input.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,14 @@
2525
"pileup_bam": {
2626
"type": "string",
2727
"pattern": "^\\S+\\.bam$",
28-
"errorMessage": "pileup BAM file for sample must be provided, cannot contain spaces and must have extension '.bam'"
28+
"errorMessage": "BAM file for computing a pileup per sample, it must have extension '.bam'"
2929
},
3030
"pileup_ind": {
3131
"type": "string",
3232
"pattern": "^\\S+\\.bam.csi$",
33-
"errorMessage": "BAM index pilup file for sample must be provided, cannot contain spaces and must have extension '.bam.csi'"
33+
"errorMessage": "BAM index file for computing a pileup per sample, it must have extension '.bam.csi'"
3434
}
3535
},
36-
"required": ["sample", "vcf", "bam"]
36+
"required": ["sample", "vcf"]
3737
}
3838
}

assets/useful_scripts/deepcsa_maf2samplevcfs.py

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@ def build_vcf_like_dataframe(mutations_dataframe, samplee):
4343
Build a VCF-like dataframe from the mutations dataframe.
4444
input needs to have:
4545
['CHROM', 'POS', 'REF', 'ALT', 'DEPTH', 'ALT_DEPTH']
46+
can optionally have:
47+
['FILTER', 'INFO'] and ['ALT_DEPTH_AM', 'DEPTH_AM']
4648
output needs to have:
4749
['CHROM', 'POS', 'REF', 'ALT', 'FILTER', 'INFO', 'FORMAT', 'SAMPLE']
4850
"""
@@ -58,10 +60,16 @@ def build_vcf_like_dataframe(mutations_dataframe, samplee):
5860
print(f"WARNING: INFO column is missing from the mutations dataframe. Setting it to 'SAMPLE={samplee};'")
5961
mutations_dataframe["INFO"] = f"SAMPLE={samplee};"
6062

63+
if 'ALT_DEPTH_AM' not in mutations_dataframe.columns or 'DEPTH_AM' not in mutations_dataframe.columns:
64+
print("WARNING: Optional columns: ALT_DEPTH_AM and DEPTH_AM are missing from the mutations dataframe.")
65+
print(" These are being initialized as its duplex values.")
66+
mutations_dataframe['ALT_DEPTH_AM'] = mutations_dataframe['ALT_DEPTH']
67+
mutations_dataframe['DEPTH_AM'] = mutations_dataframe['DEPTH']
68+
6169
# Create a new dataframe with the required columns
62-
vcf_like_df = mutations_dataframe[['CHROM', 'POS', 'REF', 'ALT', 'FILTER', 'INFO', 'DEPTH', 'ALT_DEPTH']].copy()
70+
vcf_like_df = mutations_dataframe[['CHROM', 'POS', 'REF', 'ALT', 'FILTER', 'INFO', 'DEPTH', 'ALT_DEPTH', 'ALT_DEPTH_AM', 'DEPTH_AM']].copy()
6371
vcf_like_df["FORMAT"] = "GT:DP:VD:AD:AF:RD:ALD:CDP:CAD:NDP:CDPAM:CADAM:NDPAM"
64-
vcf_like_df["SAMPLE"] = vcf_like_df[['DEPTH', 'ALT_DEPTH']].apply(
72+
vcf_like_df["SAMPLE"] = vcf_like_df[['DEPTH', 'ALT_DEPTH', 'ALT_DEPTH_AM', 'DEPTH_AM']].apply(
6573
lambda x: "{GT}:{DP}:{VD}:{AD}:{AF}:{RD}:{ALD}:{CDP}:{CAD}:{NDP}:{CDPAM}:{CADAM}:{NDPAM}".format(
6674
GT="0/1",
6775
DP=x['DEPTH'],
@@ -73,8 +81,8 @@ def build_vcf_like_dataframe(mutations_dataframe, samplee):
7381
CDP=x['DEPTH'],
7482
CAD=f"{x['DEPTH'] - x['ALT_DEPTH']},{x['ALT_DEPTH']}",
7583
NDP="0",
76-
CDPAM=x['DEPTH'],
77-
CADAM=f"{x['DEPTH'] - x['ALT_DEPTH']},{x['ALT_DEPTH']}",
84+
CDPAM=x['DEPTH_AM'],
85+
CADAM=f"{x['DEPTH_AM'] - x['ALT_DEPTH_AM']},{x['ALT_DEPTH_AM']}",
7886
NDPAM="0"
7987
),
8088
axis=1
@@ -86,7 +94,7 @@ def remove_deepcsa_filters(old_filt, filters_to_removee):
8694
"""
8795
Remove deepCSA filters from the FILTER field of the VCF file.
8896
"""
89-
filter_result = sorted([ x for x in old_filt.split(";") if x not in filters_to_removee ])
97+
filter_result = sorted([ x for x in str(old_filt).split(";") if x not in filters_to_removee ])
9098
return ";".join(filter_result) if filter_result != [] else "PASS"
9199

92100

0 commit comments

Comments
 (0)