Machine Learning Techniques Accurately Classify Microbial Communities by Bacterial Vaginosis Characteristics

被引:63
作者
Beck, Daniel [1 ]
Foster, James A.
机构
[1] Univ Idaho, Dept Biol Sci, Moscow, ID 83843 USA
基金
美国国家科学基金会;
关键词
VAGINAL MICROBIOME; INFECTIONS; SEQUENCES;
D O I
10.1371/journal.pone.0087830
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microbial communities are important to human health. Bacterial vaginosis (BV) is a disease associated with the vagina microbiome. While the causes of BV are unknown, the microbial community in the vagina appears to play a role. We use three different machine-learning techniques to classify microbial communities into BV categories. These three techniques include genetic programming (GP), random forests (RF), and logistic regression (LR). We evaluate the classification accuracy of each of these techniques on two different datasets. We then deconstruct the classification models to identify important features of the microbial community. We found that the classification models produced by the machine learning techniques obtained accuracies above 90% for Nugent score BV and above 80% for Amsel criteria BV. While the classification models identify largely different sets of important features, the shared features often agree with past research.
引用
收藏
页数:8
相关论文
共 25 条
[1]   PHYLOGENETIC IDENTIFICATION AND IN-SITU DETECTION OF INDIVIDUAL MICROBIAL-CELLS WITHOUT CULTIVATION [J].
AMANN, RI ;
LUDWIG, W ;
SCHLEIFER, KH .
MICROBIOLOGICAL REVIEWS, 1995, 59 (01) :143-169
[2]   NONSPECIFIC VAGINITIS - DIAGNOSTIC-CRITERIA AND MICROBIAL AND EPIDEMIOLOGIC ASSOCIATIONS [J].
AMSEL, R ;
TOTTEN, PA ;
SPIEGEL, CA ;
CHEN, KCS ;
ESCHENBACH, D ;
HOLMES, KK .
AMERICAN JOURNAL OF MEDICINE, 1983, 74 (01) :14-22
[3]  
[Anonymous], 1995, P 3 INT C DOCUMENT A, DOI DOI 10.1109/ICDAR.1995.598994
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Vaginal microbiome and sexually transmitted infections: an epidemiologic perspective [J].
Brotman, Rebecca M. .
JOURNAL OF CLINICAL INVESTIGATION, 2011, 121 (12) :4610-4617
[6]  
Eiben A.E., 2007, INTRO EVOLUTIONARY C
[7]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22
[8]   Inferring Correlation Networks from Genomic Survey Data [J].
Friedman, Jonathan ;
Alm, Eric J. .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (09)
[9]  
Hastie T., 2009, ELEMENTS STAT LEARNI, DOI 10.1007/978-0-387-84858-7
[10]   The prevalence of bacterial vaginosis in the United States, 2001-2004; Associations with symptoms, sexual behaviors, and reproductive health [J].
Koumans, Emilia H. ;
Sternberg, Maya ;
Bruce, Carol ;
McQuillan, Geraldine ;
Kendrick, Juliette ;
Sutton, Madeline ;
Markowitz, Lauri E. .
SEXUALLY TRANSMITTED DISEASES, 2007, 34 (11) :864-869