File Handling

General information about peak and feature maps

For general information about a peak or feature map, use the FileInfo tool.

It can print RT, m/z and intensity ranges, the overall number of peaks, and the distribution of MS levels.
It can print a statistical summary of intensities.
It can print some meta information.
It can validate XML files against their schema.
It can check for corrupt data in peak files See the FileInfo –help for details.

Problems with input files

If you are experiencing problems while processing an XML file, check if the file does validate against the XML schema:

FileInfo -v -in infile.mzML

Validation is available for several file formats including mzML, mzData, mzXML, featureXML and idXML.

Another frequently-occurring problem is corrupt data. You can check for corrupt data in peak files with FileInfo as well:

FileInfo -c -in infile.mzML

Converting your files to mzML

The TOPP tools work only on the HUPO-PSI mzML format. If you need to convert mzData, mzXML or ANDI/MS data to mzML, use the FileConverter, e.g.

FileConverter -in infile.mzXML -out outfile.mzML

For format names as file extension, the tool derives the format from the extension. For other extensions, the file formats of the input and output file can be given explicitly.

Compression of mzML files

TOPP tools now support writing compressed .mzML.gz files for efficient storage. For example, PeakPickerHiRes can output compressed files:

PeakPickerHiRes -in input.mzML -out output.mzML.gz -threads 8

Compression uses pigz (parallel gzip) if installed for faster performance, falling back to OpenMS's internal compression mechanism otherwise. When using pigz, OpenMS limits threads to the user-specified value (e.g., -threads 8) via omp_get_max_threads(), ensuring compatibility with cluster schedulers. Install pigz for optimal speed.

Trade-offs:

Efficiency: .mzML.gz files are 2-3x smaller; pigz is significantly faster but CPU-intensive. Compatibility: Ensure downstream tools support .mzML.gz.

This feature supports indexed mzML and enhances data management.

Converting between DTA and mzML

Sequest DTA files can be extracted from a mzML file using the DTAExtractor:

DTAExtractor -in infile.mzML -out outfile

The retention time of a scan, the precursor mass-to-charge ratio (for MS/MS scans) and the file extension are appended to the output file name.

To combine several files (e.g. DTA files) to an mzML file use the FileMerger:

FileMerger -in infile_list.txt -out outfile.mzML

The retention times of the scans can be generated, taken from the infile_list.txt or can be extracted from the DTA file names. See the FileMerger documentation for details.

Extracting part of the data from a file

To extract part of the data from an mzML file, use the FileFilter tool. It allows filtering for RT, m/z and intensity range or for MS level. To extract the MS/MS scans between retention time 100 and 1500, use the following command:

FileFilter -in infile.mzML -levels 2 -rt 100:1500 -out outfile.mzML

Conversion Between OpenMS XML Formats and Text Formats

Export of OpenMS XML formats

As TOPP offers no functionality for statistical analysis, this step is normally done using external statistics packages. In order to export the OpenMS XML formats into an appropriate format for these packages the TOPP TextExporter can be used.

It converts the the following OpenMS XML formats to text files:

featureXML
idXML
consensusXML

The use of the TextExporter is is very simple:

TextExporter -in infile.idXML -out outfile.txt

Import of feature data to OpenMS

OpenMS offers a lot of visualization and analysis functionality for feature data. Feature data in text format, e.g. from other analysis tools, can be imported using the TextImporter. The default mode accepts comma separated values containing the following columns: RT, m/z, intensity. Additionally meta data columns may follow. If meta data is used, meta data column names have to be specified in a header line. Without headers:

1201	503.123	1435000
1201	1006.246	1235200

Or with headers:

RT	m/z	Int	isHeavy	myMeta
1201	503.123	1435000	true	2
1201	1006.246	1235200	maybe	1

Example invocation:

TextImporter -in infile.txt -out outfile.featureXML

The tool also supports data from msInspect,SpecArray and Kroenik(Hardkloer sibling), just specify the -mode option accordingly.

Import of protein/peptide identification data to OpenMS

Peptide/protein identification data from several identification engines can be converted to idXML format using the IDFileConverter tool.

It can currently read the following formats:

Sequest output folder
pepXML file
idXML file

It can currently write the following formats:

pepXML
idXML

This example shows how to convert pepXML to idXML:

IDFileConverter -in infile.pepXML -out outfile.idXML

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File Handling

General information about peak and feature maps

Problems with input files

Converting your files to mzML

Compression of mzML files

Converting between DTA and mzML

Extracting part of the data from a file

Conversion Between OpenMS XML Formats and Text Formats

Export of OpenMS XML formats

Import of feature data to OpenMS

Import of protein/peptide identification data to OpenMS

FilesExpand file tree

file-handling.md

Latest commit

History

file-handling.md

File metadata and controls

File Handling

General information about peak and feature maps

Problems with input files

Converting your files to mzML

Compression of mzML files

Converting between DTA and mzML

Extracting part of the data from a file

Conversion Between OpenMS XML Formats and Text Formats

Export of OpenMS XML formats

Import of feature data to OpenMS

Import of protein/peptide identification data to OpenMS