Large-scale extraction of drug-disease pairs from the medical literature

被引:20
作者
Wang, Pengwei [1 ]
Hao, Tianyong [2 ]
Yan, Jun [3 ]
Jin, Lianwen [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Univ Foreign Studies, Cisco Sch Informat, Guangzhou, Guangdong, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
KNOWLEDGE; ACQUISITION;
D O I
10.1002/asi.23876
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic extraction of large-scale and accurate drug-disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time-consuming to manually label drug-disease pairs datasets. There are many drug-disease pairs buried in free text. In this work, we first leverage a pattern-based method to automatically extract drug-disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug-disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug-disease pair. In the experiments, we use the method to extract treatment and inducement drug-disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug-disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug-disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug-disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine-grained evaluation of extracting frequent pairs.
引用
收藏
页码:2649 / 2661
页数:13
相关论文
共 36 条
[1]  
Abacha A. B., 2011, P 4 INT S SEM MIN BI
[2]   EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[3]   A knowledge-poor approach to chemical-disease relation extraction [J].
Alam, Firoj ;
Corazza, Anna ;
Lavelli, Alberto ;
Zanoli, Roberto .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
[4]  
[Anonymous], 2015, P 24 INT C WORLD WID
[5]  
[Anonymous], DATABASE, DOI DOI 10.3390/PHARMACEUTICS8010008
[6]  
Blaschke C., 1991, P INT C INT SYST MOL
[7]   Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research [J].
Bravo, Alex ;
Pinero, Janet ;
Queralt-Rosinach, Nuria ;
Rautschka, Michael ;
Furlong, Laura I. .
BMC BIOINFORMATICS, 2015, 16
[8]   Automated acquisition of disease-drug knowledge from biomedical and clinical documents: An initial study [J].
Chen, Elizabeth S. ;
Hripcsak, George ;
Xu, Hua ;
Markatou, Marianthi ;
Friedman, Carol .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (01) :87-98
[9]  
Chen Y., 2015, P 5 BIOCREATIVE CHAL
[10]  
CIMINO JJ, 1993, METHOD INFORM MED, V32, P120