Large-scale extraction of drug-disease pairs from the medical literature

被引:17
作者
Wang, Pengwei [1 ]
Hao, Tianyong [2 ]
Yan, Jun [3 ]
Jin, Lianwen [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Univ Foreign Studies, Cisco Sch Informat, Guangzhou, Guangdong, Peoples R China
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
KNOWLEDGE; ACQUISITION;
D O I
10.1002/asi.23876
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic extraction of large-scale and accurate drug-disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time-consuming to manually label drug-disease pairs datasets. There are many drug-disease pairs buried in free text. In this work, we first leverage a pattern-based method to automatically extract drug-disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug-disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug-disease pair. In the experiments, we use the method to extract treatment and inducement drug-disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug-disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug-disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug-disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine-grained evaluation of extracting frequent pairs.
引用
收藏
页码:2649 / 2661
页数:13
相关论文
共 36 条
  • [1] Abacha A. B., 2011, P 4 INT S SEM MIN BI
  • [2] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [3] A knowledge-poor approach to chemical-disease relation extraction
    Alam, Firoj
    Corazza, Anna
    Lavelli, Alberto
    Zanoli, Roberto
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [4] [Anonymous], 2015, P 24 INT C WORLD WID
  • [5] [Anonymous], DATABASE, DOI DOI 10.3390/PHARMACEUTICS8010008
  • [6] Blaschke C., 1991, P INT C INT SYST MOL
  • [7] Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
    Bravo, Alex
    Pinero, Janet
    Queralt-Rosinach, Nuria
    Rautschka, Michael
    Furlong, Laura I.
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [8] Automated acquisition of disease-drug knowledge from biomedical and clinical documents: An initial study
    Chen, Elizabeth S.
    Hripcsak, George
    Xu, Hua
    Markatou, Marianthi
    Friedman, Carol
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (01) : 87 - 98
  • [9] Chen Y., 2015, P 5 BIOCREATIVE CHAL
  • [10] CIMINO JJ, 1993, METHOD INFORM MED, V32, P120