Applying Monte Carlo Simulation to Biomedical Literature to Approximate Genetic Network

被引:14
作者
Al-Dalky, Rami [1 ]
Taha, Kamal [2 ]
Al Homouz, Dirar [3 ]
Qasaimeh, Murad [2 ]
机构
[1] Case Western Reserve Univ, Dept Elect Engn & Comp Sci, Cleveland, OH 44106 USA
[2] Khalifa Univ, Dept Elect & Comp Engn, Abu Dhabi, U Arab Emirates
[3] Khalifa Univ, Dept Appl Math & Sci, Abu Dhabi, U Arab Emirates
关键词
Text mining; information extraction; biological NLP; biomedical literature; gene regulatory network; Monte Carlo simulation; gene-disease associations; SEMANTIC SIMILARITY; ONTOLOGY; FEATURES; ASSOCIATIONS; TOOL;
D O I
10.1109/TCBB.2015.2481399
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g, the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g. It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.
引用
收藏
页码:494 / 504
页数:11
相关论文
共 56 条
[1]   A literature based method for identifying gene-disease connections [J].
Adamic, LA ;
Wilkinson, D ;
Huberman, BA ;
Adar, E .
CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, :109-117
[2]  
Al-Mubaid Hisham., 2005, American Journal of Biochemistry and Biotechnology, V1, P145, DOI DOI 10.3844/ajbbsp.2004.145.152
[3]   CoPub Mapper: mining MEDLINE based on search term co-publication [J].
Alako, BTF ;
Veldhoven, A ;
van Baal, S ;
Jelier, R ;
Verhoeven, S ;
Rullmann, T ;
Polman, J ;
Jenster, G .
BMC BIOINFORMATICS, 2005, 6 (1)
[4]  
[Anonymous], 2013, DBGET DATABASE
[5]  
[Anonymous], 2015, SANGER PFAM DATABASE
[6]  
[Anonymous], 1998, An information-theoretic definition of similarity
[7]  
[Anonymous], 2015, DISGENET DATABASE GE
[8]  
[Anonymous], 2015, SGD SACCHAROMYCES GE
[9]  
[Anonymous], 2015, MORBID MAP OMIM DOWN
[10]   Detecting reliable gene interactions by a hierarchy of Bayesian network classifiers [J].
Armananzas, Ruben ;
Inza, Inaki ;
Larranaga, Pedro .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2008, 91 (02) :110-121