@Minter: automated text-mining of microbial interactions

被引:25
作者
Lim, Kun Ming Kenneth [1 ,2 ]
Li, Chenhao [1 ,3 ]
Chng, Kern Rei [1 ]
Nagarajan, Niranjan [1 ,3 ]
机构
[1] Genome Inst Singapore, Computat & Syst Biol, Singapore 138672, Singapore
[2] Natl Univ Singapore, Fac Sci, Computat Biol Program, Singapore, Singapore
[3] Natl Univ Singapore, Dept Comp Sci, Singapore, Singapore
关键词
DISEASE;
D O I
10.1093/bioinformatics/btw357
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microbial consortia are frequently defined by numerous interactions within the community that are key to understanding their function. While microbial interactions have been extensively studied experimentally, information regarding them is dispersed in the scientific literature. As manual collation is an infeasible option, automated data processing tools are needed to make this information easily accessible. Results: We present @Minter, an automated information extraction system based on Support Vector Machines to analyze paper abstracts and infer microbial interactions. @Minter was trained and tested on a manually curated gold standard dataset of 735 species interactions and 3917 annotated abstracts, constructed as part of this study. Cross-validation analysis showed that @Minter was able to detect abstracts pertaining to one or more microbial interactions with high specificity (specificity=95%, AUC = 0.97). Despite challenges in identifying specific microbial interactions in an abstract (interaction level recall = 95%, precision = 25%), @Minter was shown to reduce annotator workload 13-fold compared to alternate approaches. Applying @Minter to 175 bacterial species abundant on human skin, we identified a network of 357 literature-reported microbial interactions, demonstrating its utility for the study of microbial communities.
引用
收藏
页码:2981 / 2987
页数:7
相关论文
共 32 条
[1]  
[Anonymous], PROCESSING, DOI 10.1.1.121.1424
[2]  
[Anonymous], 2001, Snowball: a language for stemming algorithms
[3]   Investigating microbial co-occurrence patterns based on metagenomic compositional data [J].
Ban, Yuguang ;
An, Lingling ;
Jiang, Hongmei .
BIOINFORMATICS, 2015, 31 (20) :3322-3329
[4]   GroEL of Lactobacillus johnsonii La1 (NCC 533) is cell surface associated:: Potential role in interactions with the host and the gastric pathogen Helicobacter pylori [J].
Bergonzelli, GE ;
Granato, D ;
Pridmore, RD ;
Marvin-Guy, LF ;
Donnicola, D ;
Corthésy-Theulaz, IE .
INFECTION AND IMMUNITY, 2006, 74 (01) :425-434
[5]  
Bielski E., 2014, CCREPE CCREPE NC SCO
[6]   Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile [J].
Buffie, Charlie G. ;
Bucci, Vanni ;
Stein, Richard R. ;
McKenney, Peter T. ;
Ling, Lilan ;
Gobourne, Asia ;
No, Daniel ;
Liu, Hui ;
Kinnebrew, Melissa ;
Viale, Agnes ;
Littmann, Eric ;
van den Brink, Marcel R. M. ;
Jenq, Robert R. ;
Taur, Ying ;
Sander, Chris ;
Cross, Justin R. ;
Toussaint, Nora C. ;
Xavier, Joao B. ;
Pamer, Eric G. .
NATURE, 2015, 517 (7533) :205-U207
[7]   Automated acquisition of disease-drug knowledge from biomedical and clinical documents: An initial study [J].
Chen, Elizabeth S. ;
Hripcsak, George ;
Xu, Hua ;
Markatou, Marianthi ;
Friedman, Carol .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (01) :87-98
[8]   Functional gene arrays-based analysis of fecal microbiomes in patients with liver cirrhosis [J].
Chen, Yanfei ;
Qin, Nan ;
Guo, Jing ;
Qian, Guirong ;
Fang, Daiqiong ;
Shi, Ding ;
Xu, Min ;
Yang, Fengling ;
He, Zhili ;
Van Nostrand, Joy D. ;
Yuan, Tong ;
Deng, Ye ;
Zhou, Jizhong ;
Li, Lanjuan .
BMC GENOMICS, 2014, 15
[9]   Gut biogeography of the bacterial microbiota [J].
Donaldson, Gregory P. ;
Lee, S. Melanie ;
Mazmanian, Sarkis K. .
NATURE REVIEWS MICROBIOLOGY, 2016, 14 (01) :20-32
[10]   PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine [J].
Donaldson, I ;
Martin, J ;
de Bruijn, B ;
Wolting, C ;
Lay, V ;
Tuekam, B ;
Zhang, SD ;
Baskin, B ;
Bader, GD ;
Michalickova, K ;
Pawson, T ;
Hogue, CWV .
BMC BIOINFORMATICS, 2003, 4 (1)