Large-scale extraction of drug-disease pairs from the medical literature

被引：20

作者：

Wang, Pengwei ^{[1
]}

Hao, Tianyong ^{[2
]}

Yan, Jun ^{[3
]}

Jin, Lianwen ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China

[2] Guangdong Univ Foreign Studies, Cisco Sch Informat, Guangzhou, Guangdong, Peoples R China

[3] Microsoft Res Asia, Beijing, Peoples R China

来源：

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY | 2017年 / 68卷 / 11期

关键词：

KNOWLEDGE; ACQUISITION;

D O I：

10.1002/asi.23876

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic extraction of large-scale and accurate drug-disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time-consuming to manually label drug-disease pairs datasets. There are many drug-disease pairs buried in free text. In this work, we first leverage a pattern-based method to automatically extract drug-disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug-disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug-disease pair. In the experiments, we use the method to extract treatment and inducement drug-disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug-disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug-disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug-disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine-grained evaluation of extracting frequent pairs.

引用

页码：2649 / 2661

页数：13

共 36 条

[1]

Abacha A. B., 2011, P 4 INT S SEM MIN BI

[2] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].

AHO, AV ;

CORASICK, MJ .

COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340

[3] A knowledge-poor approach to chemical-disease relation extraction [J].

Alam, Firoj ;

Corazza, Anna ;

Lavelli, Alberto ;

Zanoli, Roberto .

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,

[4]

[Anonymous], 2015, P 24 INT C WORLD WID

[5]

[Anonymous], DATABASE, DOI DOI 10.3390/PHARMACEUTICS8010008

[6]

Blaschke C., 1991, P INT C INT SYST MOL

[7] Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research [J].

Bravo, Alex ;

Pinero, Janet ;

Queralt-Rosinach, Nuria ;

Rautschka, Michael ;

Furlong, Laura I. .

BMC BIOINFORMATICS, 2015, 16

[8] Automated acquisition of disease-drug knowledge from biomedical and clinical documents: An initial study [J].

Chen, Elizabeth S. ;

Hripcsak, George ;

Xu, Hua ;

Markatou, Marianthi ;

Friedman, Carol .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (01) :87-98

[9]

Chen Y., 2015, P 5 BIOCREATIVE CHAL

[10]

CIMINO JJ, 1993, METHOD INFORM MED, V32, P120

← 1 2 3 4 →