Extracting causal relations from the literature with word vector mapping

被引:19
作者
An, Ning [1 ,2 ]
Xiao, Yongbo [1 ,2 ]
Yuan, Jing [3 ]
Yang, Jiaoyun [1 ,2 ]
Alterovitz, Gil [4 ]
机构
[1] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China
[3] Chinese Acad Med Sci, Dept Neurol, Peking Union Med Coll Hosp, Beijing, Peoples R China
[4] Harvard Med Sch, Boston Childrens Hosp, Boston, MA 02115 USA
基金
国家重点研发计划;
关键词
Causality; Literature analysis; Word vector; Causal graph; Causal extraction;
D O I
10.1016/j.compbiomed.2019.103524
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Causal graphs play an essential role in the determination of causalities and have been applied in many domains including biology and medicine. Traditional causal graph construction methods are usually data-driven and may not deliver the desired accuracy of a graph. Considering the vast number of publications with causality knowledge, extracting causal relations from the literature to help to establish causal graphs becomes possible. Current supervised-learning-based causality extraction methods requires sufficient labeled data to train a model, and rule-based causality extraction methods are limited by the predefined patterns. This paper proposes a causality extraction framework by integrating rule-based methods and unsupervised learning models to overcome these limitations. The proposed method consists of three modules, including data preprocessing, syntactic pattern matching, and causality determination. In data preprocessing, abstracts are crawled based on attribute names before sentences are extracted and simplified. In syntactic pattern matching, these simplified sentences are parsed to obtain the part-of-speech tags, and triples are achieved based on these tags by matching the two designed syntactic patterns. In causality determination, four verb seed sets are initialized, and word vectors are constructed for the verbs in both the seed sets and the triples by applying an unsupervised machine learning model. Causal relations are identified by comparing the similarity between the verbs in each triple and that in each seed set to overcome the limitation of the seed sets. Causality extraction results on the attributes from the risk factors for Alzheimer's disease show that our method outperforms Bui's method and Alashri's method in terms of precision, recall, specificity, accuracy and F-score, with increases in the F-score of 8.29% and 5.37%, respectively.
引用
收藏
页数:8
相关论文
共 28 条
[1]   Snowball: Extracting Causal Chains from Climate Change Text Corpora [J].
Alashri, Saud ;
Tsai, Jiun-Yi ;
Koppela, Anvesh Reddy ;
Davulcu, Hasan .
2018 1ST INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2018), 2018, :234-241
[2]   Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain [J].
Alobaidi, Mazen ;
Malik, Khalid Mahmood ;
Hussain, Maqbool .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 165 :117-128
[3]  
[Anonymous], 2008, LREC
[4]  
[Anonymous], **DROPPED REF**
[5]   Extracting Causal Relations Among Complex Events in Natural Science Literature [J].
Barik, Biswanath ;
Marsi, Erwin ;
Ozturk, Pinar .
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 :131-137
[6]   Extracting causal relations on HIV drug resistance from literature [J].
Bui, Quoc-Chinh ;
Nuallain, Breanndan O. ;
Boucher, Charles A. ;
Sloot, Peter M. A. .
BMC BIOINFORMATICS, 2010, 11
[7]  
Bunescu Razvan C, 2007, NATURAL LANGUAGE PRO, P29
[8]   Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities [J].
Chang, DS ;
Choi, KS .
INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (03) :662-678
[9]   Improving Bayesian network structure learning with mutual information-based node ordering in the K2 algorithm [J].
Chen, Xue-Wen ;
Anantha, Gopalakrishna ;
Lin, Xiaotong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) :628-640
[10]   A BAYESIAN METHOD FOR THE INDUCTION OF PROBABILISTIC NETWORKS FROM DATA [J].
COOPER, GF ;
HERSKOVITS, E .
MACHINE LEARNING, 1992, 9 (04) :309-347