MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data

被引:19
作者
Simon, L. M. [1 ]
Karg, S. [1 ]
Westermann, A. J. [2 ,3 ]
Engel, M. [1 ,4 ]
Elbehery, A. H. A. [5 ]
Hense, B. [1 ]
Heinig, M. [1 ]
Deng, L. [5 ]
Theis, F. J. [1 ,6 ]
机构
[1] Helmholtz Zentrum Munchen, German Res Ctr Environm Hlth, Inst Computat Biol, Ingolstadter Landstr, D-185764 Neuherberg, Germany
[2] Univ Wurzburg, Inst Mol Infect Biol, Wurzburg, Germany
[3] Helmholtz Inst RNA Based Infect Res, Wurzburg, Germany
[4] Helmholtz Zentrum Munchen, German Res Ctr Environm Hlth, Sci Comp Res Unit, Neuherberg, Germany
[5] Helmholtz Zentrum Munchen, German Res Ctr Environm Hlth, Inst Virol, Neuherberg, Germany
[6] Tech Univ Munich, Munich, Germany
基金
欧盟地平线“2020”;
关键词
high-performance computing; big data; RNA-seq; sequence read archive; metatranscriptomics; microbiome; virome; human disease; infection; MICROBIOME; PATHOGEN; OBESITY; GENOMES; HEALTH;
D O I
10.1093/gigascience/giy070
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: With the advent of the age of big data in bioinformatics, large volumes of data and high-performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts; however, its generic nature also enables the detection of microbial and viral transcripts. Findings: We developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from six independent, controlled infection experiments of cell line models and compared them with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from more than 17,000 samples from more than 400 studies relevant to human disease using state-of-the-art high-performance computing systems. The resulting data from this large-scale re-analysis are made available in the presented MetaMap resource. Conclusions: Our results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation toward the role of the microbiome in human disease. Additionally, codes to process new datasets and perform statistical analyses are made available.
引用
收藏
页数:8
相关论文
共 38 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Phenotypic Responses of Differentiated Asthmatic Human Airway Epithelial Cultures to Rhinovirus [J].
Bai, Jianwu ;
Smock, Steven L. ;
Jackson, George R., Jr. ;
MacIsaac, Kenzie D. ;
Huang, Yongsheng ;
Mankus, Courtney ;
Oldach, Jonathan ;
Roberts, Brian ;
Ma, Yu-Lu ;
Klappenbach, Joel A. ;
Crackower, Michael A. ;
Alves, Stephen E. ;
Hayden, Patrick J. .
PLOS ONE, 2015, 10 (02)
[3]   Mining RNA-Seq Data for Infections and Contaminations [J].
Bonfert, Thomas ;
Csaba, Gergely ;
Zimmer, Ralf ;
Friedel, Caroline C. .
PLOS ONE, 2013, 8 (09)
[4]   Changes in gut microbiota control metabolic endotoxemia-induced inflammation in high-fat diet-induced obesity and diabetes in mice [J].
Cani, Patrice D. ;
Bibiloni, Rodrigo ;
Knauf, Claude ;
Neyrinck, Audrey M. ;
Neyrinck, Audrey M. ;
Delzenne, Nathalle M. ;
Burcelin, Remy .
DIABETES, 2008, 57 (06) :1470-1481
[5]   Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma [J].
Castellarin, Mauro ;
Warren, Rene L. ;
Freeman, J. Douglas ;
Dreolini, Lisa ;
Krzywinski, Martin ;
Strauss, Jaclyn ;
Barnes, Rebecca ;
Watson, Peter ;
Allen-Vercoe, Emma ;
Moore, Richard A. ;
Holt, Robert A. .
GENOME RESEARCH, 2012, 22 (02) :299-306
[6]   A survey of best practices for RNA-seq data analysis [J].
Conesa, Ana ;
Madrigal, Pedro ;
Tarazona, Sonia ;
Gomez-Cabrero, David ;
Cervera, Alejandra ;
McPherson, Andrew ;
Szczesniak, Michal Wojciech ;
Gaffney, Daniel J. ;
Elo, Laura L. ;
Zhang, Xuegong ;
Mortazavi, Ali .
GENOME BIOLOGY, 2016, 17
[7]   Sequencing the human microbiome in health and disease [J].
Cox, Michael J. ;
Cookson, William O. C. M. ;
Moffatt, Miriam F. .
HUMAN MOLECULAR GENETICS, 2013, 22 :R88-R94
[8]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[9]   Influence of lung CT changes in chronic obstructive pulmonary disease (COPD) on the human lung microbiome [J].
Engel, Marion ;
Endesfelder, David ;
Schloter-Hai, Brigitte ;
Kublik, Susanne ;
Granitsiotis, Michael S. ;
Boschetto, Piera ;
Stendardo, Mariarita ;
Barta, Imre ;
Dome, Balazs ;
Deleuze, Jean-Francois ;
Boland, Anne ;
Mueller-Quernheim, Joachim ;
Prasse, Antje ;
Welte, Tobias ;
Hohlfeld, Jens ;
Subramanian, Deepak ;
Parr, David ;
Gut, Ivo Glynne ;
Greulich, Timm ;
Koczulla, Andreas Rembert ;
Nowinski, Adam ;
Gorecka, Dorota ;
Singh, Dave ;
Gupta, Sumit ;
Brightling, Christopher E. ;
Hoffmann, Harald ;
Frankenberger, Marion ;
Hofer, Thomas P. ;
Burggraf, Dorothe ;
Heiss-Neumann, Marion ;
Ziegler-Heitbrock, Loems ;
Schloter, Michael ;
zu Castell, Wolfgang .
PLOS ONE, 2017, 12 (07)
[10]  
Engström PG, 2013, NAT METHODS, V10, P1185, DOI [10.1038/NMETH.2722, 10.1038/nmeth.2722]