ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion

被引:131
作者
Hulstaert, Niels [1 ,2 ]
Shofstahl, Jim [3 ]
Sachsenberg, Timo [4 ]
Walzer, Mathias [5 ]
Barsnes, Harald [6 ,7 ]
Martens, Lennart [1 ,2 ]
Perez-Riverol, Yasset [5 ]
机构
[1] UGent VIB, Ctr Med Biotechnol, B-9000 Ghent, Belgium
[2] Univ Ghent, Dept Biomol Med, B-9000 Ghent, Belgium
[3] Thermo Fisher Sci, 355 River Oaks Pkwy, San Jose, CA 95134 USA
[4] Univ Tubingen, Dept Comp Sci, Appl Bioinformat, Sand 14, D-72076 Tubingen, Germany
[5] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England
[6] Univ Bergen, Dept Informat, Computat Biol Unit CBU, N-5020 Bergen, Norway
[7] Univ Bergen, Dept Biomed, Prote Unit PROBE, N-5020 Bergen, Norway
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 英国惠康基金;
关键词
bioinformatics; file formats; open source; cloud; mass spectrometry; software; big data; workflows; mzML; metadata; MASS-SPECTROMETRY DATA; PROTEOMICS; IDENTIFICATION; DATABASES;
D O I
10.1021/acs.jproteome.9b00328
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analyzed per experiment as well as by the growing amount of data obtained in each analytical run. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Here, we present ThermoRawFileParser, an open-source, cross-platform tool that converts Thermo RAW files into open file formats such as MGF and the HUPO-PSI standard file format mzML. To ensure the broadest possible availability and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda package and BioContainers container around ThermoRawFileParser. In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.
引用
收藏
页码:537 / 542
页数:6
相关论文
共 31 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update [J].
Afgan, Enis ;
Baker, Dannon ;
Batut, Berenice ;
van den Beek, Marius ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Gruening, Bjoern A. ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Hiltemann, Saskia ;
Jalili, Vahid ;
Rasche, Helena ;
Soranzo, Nicola ;
Goecks, Jeremy ;
Taylor, James ;
Nekrutenko, Anton ;
Blankenberg, Daniel .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W537-W544
[2]   Update on the moFF Algorithm for Label-Free Quantitative Proteomics [J].
Argentini, Andrea ;
Staes, An ;
Gruening, Bjoern ;
Mehta, Subina ;
Easterly, Caleb ;
Griffin, Timothy J. ;
Jagtap, Pratik ;
Impens, Francis ;
Martens, Lennart .
JOURNAL OF PROTEOME RESEARCH, 2019, 18 (02) :728-731
[3]   An Accessible Proteogenomics Informatics Resource for Cancer Researchers [J].
Chambers, Matthew C. ;
Jagtap, Pratik D. ;
Johnson, James E. ;
McGowan, Thomas ;
Kumar, Praveen ;
Onsongo, Getiria ;
Guerrero, Candace R. ;
Barsnes, Harald ;
Vaudel, Marc ;
Martens, Lennart ;
Gruening, Bjoern ;
Cooke, Ira R. ;
Heydarian, Mohammad ;
Reddy, Karen L. ;
Griffin, Timothy J. .
CANCER RESEARCH, 2017, 77 (21) :E43-E46
[4]   A cross-platform toolkit for mass spectrometry and proteomics [J].
Chambers, Matthew C. ;
Maclean, Brendan ;
Burke, Robert ;
Amodei, Dario ;
Ruderman, Daniel L. ;
Neumann, Steffen ;
Gatto, Laurent ;
Fischer, Bernd ;
Pratt, Brian ;
Egertson, Jarrett ;
Hoff, Katherine ;
Kessner, Darren ;
Tasman, Natalie ;
Shulman, Nicholas ;
Frewen, Barbara ;
Baker, Tahmina A. ;
Brusniak, Mi-Youn ;
Paulse, Christopher ;
Creasy, David ;
Flashner, Lisa ;
Kani, Kian ;
Moulding, Chris ;
Seymour, Sean L. ;
Nuwaysir, Lydia M. ;
Lefebvre, Brent ;
Kuhlmann, Frank ;
Roark, Joe ;
Rainer, Paape ;
Detlev, Suckau ;
Hemenway, Tina ;
Huhmer, Andreas ;
Langridge, James ;
Connolly, Brian ;
Chadick, Trey ;
Holly, Krisztina ;
Eckels, Josh ;
Deutsch, Eric W. ;
Moritz, Robert L. ;
Katz, Jonathan E. ;
Agus, David B. ;
MacCoss, Michael ;
Tabb, David L. ;
Mallick, Parag .
NATURE BIOTECHNOLOGY, 2012, 30 (10) :918-920
[5]   ABRF Proteome Informatics Research Group (iPRG) 2015 Study: Detection of Differentially Abundant Proteins in Label-Free Quantitative LC-MS/MS Experiments [J].
Choi, Meena ;
Eren-Dogu, Zeynep F. ;
Colangelo, Christopher ;
Cottrell, John ;
Hoopmann, Michael R. ;
Kapp, Eugene A. ;
Kim, Sangtae ;
Lam, Henry ;
Neubert, Thomas A. ;
Palmblad, Magnus ;
Phinney, Brett S. ;
Weintraub, Susan T. ;
MacLean, Brendan ;
Vitek, Olga .
JOURNAL OF PROTEOME RESEARCH, 2017, 16 (02) :945-957
[6]   MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments [J].
Choi, Meena ;
Chang, Ching-Yun ;
Clough, Timothy ;
Broudy, Daniel ;
Killeen, Trevor ;
MacLean, Brendan ;
Vitek, Olga .
BIOINFORMATICS, 2014, 30 (17) :2524-2526
[7]   MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification [J].
Cox, Juergen ;
Mann, Matthias .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1367-1372
[8]   Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics [J].
Deutsch, Eric W. ;
Mendoza, Luis ;
Shteynberg, David ;
Slagel, Joseph ;
Sun, Zhi ;
Moritz, Robert L. .
PROTEOMICS CLINICAL APPLICATIONS, 2015, 9 (7-8) :745-754
[9]   Nextflow enables reproducible computational workflows [J].
Di Tommaso, Paolo ;
Chatzou, Maria ;
Floden, Evan W. ;
Prieto Barja, Pablo ;
Palumbo, Emilio ;
Notredame, Cedric .
NATURE BIOTECHNOLOGY, 2017, 35 (04) :316-319
[10]   Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets [J].
Griss, Johannes ;
Perez-Riverol, Yasset ;
Lewis, Steve ;
Tabb, David L. ;
Dianes, Jose A. ;
del-Toro, Noemi ;
Rurik, Marc ;
Walzer, Mathias ;
Kohlbacher, Oliver ;
Hermjakob, Henning ;
Wang, Rui ;
Vizcaino, Juan Antonio .
NATURE METHODS, 2016, 13 (08) :651-+