Developing a Reproducible Microbiome Data Analysis Pipeline Using the Amazon Web Services Cloud for a Cancer Research Group: Proof-of-Concept Study

被引:11
作者
Bai, Jinbing [1 ,2 ]
Jhaney, Ileen [3 ]
Wells, Jessica [1 ,2 ]
机构
[1] Emory Univ, Nell Hodgson Woodruff Sch Nursing, 1520 Clifton Rd NE, Atlanta, GA 30322 USA
[2] Emory Univ, Winship Canc Inst, Canc Prevent & Control Program, Atlanta, GA 30322 USA
[3] Emory Univ, Winship Canc Inst, Winship Res Informat Shared Resource, Atlanta, GA 30322 USA
关键词
Amazon Web Services; cloud computation; microbiome; pipeline; sequence analysis; GUT MICROBIOTA; HEALTH;
D O I
10.2196/14667
中图分类号
R-058 [];
学科分类号
摘要
Background: Cloud computing for microbiome data sets can significantly increase working efficiencies and expedite the translation of research findings into clinical practice. The Amazon Web Services (AWS) cloud provides an invaluable option for microbiome data storage, computation, and analysis. Objective: The goals of this study were to develop a microbiome data analysis pipeline by using AWS cloud and to conduct a proof-of-concept test for microbiome data storage, processing, and analysis. Methods: A multidisciplinary team was formed to develop and test a reproducible microbiome data analysis pipeline with multiple AWS cloud services that could be used for storage, computation, and data analysis. The microbiome data analysis pipeline developed in AWS was tested by using two data sets: 19 vaginal microbiome samples and 50 gut microbiome samples. Results: Using AWS features, we developed a microbiome data analysis pipeline that included Amazon Simple Storage Service for microbiome sequence storage, Linux Elastic Compute Cloud (EC2) instances (ie, servers) for data computation and analysis, and security keys to create and manage the use of encryption for the pipeline. Bioinformatics and statistical tools (ie, Quantitative Insights Into Microbial Ecology 2 and RStudio) were installed within the Linux EC2 instances to run microbiome statistical analysis. The microbiome data analysis pipeline was performed through command-line interfaces within the Linux operating system or in the Mac operating system. Using this new pipeline, we were able to successfully process and analyze 50 gut microbiome samples within 4 hours at a very low cost (a c4.4xlarge EC2 instance costs $0.80 per hour). Gut microbiome findings regarding diversity, taxonomy, and abundance analyses were easily shared within our research team. Conclusions: Building a microbiome data analysis pipeline with AWS cloud is feasible. This pipeline is highly reliable, computationally powerful, and cost effective. Our AWS-based microbiome analysis pipeline provides an efficient tool to conduct microbiome data analysis.
引用
收藏
页码:325 / 333
页数:9
相关论文
共 35 条
[1]   Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis [J].
Agarwal, Vibhu ;
Zhang, Liangliang ;
Zhu, Josh ;
Fang, Shiyuan ;
Cheng, Tim ;
Hong, Chloe ;
Shah, Nigam H. .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2016, 18 (09)
[2]   Microbiota meet big data [J].
不详 .
NATURE CHEMICAL BIOLOGY, 2014, 10 (08) :605-605
[3]  
[Anonymous], GETT START AM EC2 LI
[4]  
[Anonymous], 2018, OPTIM ENG, DOI DOI 10.1186/S13059-017-1381-1
[5]  
[Anonymous], FOLLOW YOUR GUT ENOR
[6]  
[Anonymous], AM GUT PROJ DAT
[7]   Composition of gut microbiota and its association with body mass index and lifestyle factors in a cohort of 7-18 years old children from the American Gut Project [J].
Bai, J. ;
Hu, Y. ;
Bruner, D. W. .
PEDIATRIC OBESITY, 2019, 14 (04)
[8]   Pilot Study of Vaginal Microbiome Using QIIME 2™ in Women With Gynecologic Cancer Before and After Radiation Therapy [J].
Bai, Jinbing ;
Jhaney, Ileen ;
Daniel, Gaea ;
Bruner, Deborah Watkins .
ONCOLOGY NURSING FORUM, 2019, 46 (02) :E48-E59
[9]   CloudNeo: a cloud pipeline for identifying patient-specific tumor neoantigens [J].
Bais, Preeti ;
Namburi, Sandeep ;
Gatti, Daniel M. ;
Zhang, Xinyu ;
Chuang, Jeffrey H. .
BIOINFORMATICS, 2017, 33 (19) :3110-3112
[10]   Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 [J].
Bolyen, Evan ;
Rideout, Jai Ram ;
Dillon, Matthew R. ;
Bokulich, NicholasA. ;
Abnet, Christian C. ;
Al-Ghalith, Gabriel A. ;
Alexander, Harriet ;
Alm, Eric J. ;
Arumugam, Manimozhiyan ;
Asnicar, Francesco ;
Bai, Yang ;
Bisanz, Jordan E. ;
Bittinger, Kyle ;
Brejnrod, Asker ;
Brislawn, Colin J. ;
Brown, C. Titus ;
Callahan, Benjamin J. ;
Caraballo-Rodriguez, Andres Mauricio ;
Chase, John ;
Cope, Emily K. ;
Da Silva, Ricardo ;
Diener, Christian ;
Dorrestein, Pieter C. ;
Douglas, Gavin M. ;
Durall, Daniel M. ;
Duvallet, Claire ;
Edwardson, Christian F. ;
Ernst, Madeleine ;
Estaki, Mehrbod ;
Fouquier, Jennifer ;
Gauglitz, Julia M. ;
Gibbons, Sean M. ;
Gibson, Deanna L. ;
Gonzalez, Antonio ;
Gorlick, Kestrel ;
Guo, Jiarong ;
Hillmann, Benjamin ;
Holmes, Susan ;
Holste, Hannes ;
Huttenhower, Curtis ;
Huttley, Gavin A. ;
Janssen, Stefan ;
Jarmusch, Alan K. ;
Jiang, Lingjing ;
Kaehler, Benjamin D. ;
Bin Kang, Kyo ;
Keefe, Christopher R. ;
Keim, Paul ;
Kelley, Scott T. ;
Knights, Dan .
NATURE BIOTECHNOLOGY, 2019, 37 (08) :852-857