You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-update README with more details about script execution
-add gzipped parsed text file results
-update gitignore with fixes to ENCODE StrainID directory typos
-change results parsing script to use pandas and include every Cell Line score instead of just the best one
ENCODE metadata was pulled on May 12, 2021 using the `scripts/get_metadata.py` script that pulls all Biosample accessions with classification="cell_line" and whose string matches one of the cell lines we have in our hg19_VCF database. These are used to pull File accessions with type=BAM and assembly=hg19. We did not filter by assay for this analysis. They are saved with all relevant metadata to the `210512_sample_metadata.txt` file.
parser=argparse.ArgumentParser(description='Parse metadata file and GenoPipe output to check detection rates of the GenoPipe tool.')
21
+
parser=argparse.ArgumentParser(description='Parse metadata file and StrainID output to check per sample detection by StrainID scores.')
12
22
parser.add_argument('-m','--metadata', metavar='metadata_fn', required=True, help='the metadata file downloaded with ENCODE dataset that includes info like PE/SE, cell line, assay type, and read lengths/SE-PE')
13
23
parser.add_argument('-i','--input-dir', metavar='input_dir', required=True, help='the directory where all the EpitopeID output files were saved (*strain.tab)')
24
+
parser.add_argument('-o','--output', metavar='output_fn', required=True, help='the output filepath for final TSV with parsed StrainID scores')
0 commit comments