A Multi-View Deep Neural Network Model for Chemical-Disease Relation Extraction From Imbalanced Datasets

被引：13

作者：

Mitra, Sayantan ^{[1
]}

Saha, Sriparna ^{[1
]}

Hasanuzzaman, Mohammed ^{[2
]}

机构：

[1] Indian Inst Technol Patna, Dept Comp Sci, Bihta 801103, India

[2] Cork Inst Technol, Dept Comp Sci, ADAPT Ctr, Bishopstown T12 P928, Ireland

来源：

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS | 2020年 / 24卷 / 11期

关键词：

Feature extraction; Neural networks; Biological system modeling; Diseases; Task analysis; Chemicals; Convolution; Multi-view classification; relation extraction; chemical-disease relations; imbalanced class; text mining;

D O I：

10.1109/JBHI.2020.2983365

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Understanding the chemical-disease relations (CDR) is a crucial task in various biomedical domains. Manual mining of these information from biomedical literature is costly and time-consuming. To address these issues, various researches have been carried out to design an efficient automatic tool. In this paper, we propose a multi-view based deep neural network model for CDR task. Typically, multiple representations (or views) of the datasets are not available for this task. So, we train multiple conceptually different deep neural network models on the dataset to generate different abstract features, treated as different views. A novel loss function, "Penalized LF", is defined to address the problem of imbalance dataset. The proposed loss function is generic in nature. The model is designed as a combination of Convolution Neural Network (CNN) and Bidirectional Long Short Term Memory (Bi-LSTM) network along with a Multi-Layer Perceptron (MLP). To show the efficacy of our proposed model, we have compared it with six baseline models and other state-of-the-art techniques, on "chemicals-and-disease- DFE" dataset, a free text dataset created by Li et al. from BioCreative V Chemical Disease Relation dataset. Results show that the proposed model attains highest F1 - score for individual classes, proving its efficiency in handling class imbalance problem in the dataset. To further demonstrate the efficacy of the proposed model, we have presented results on BioCreative V dataset and two Protein-Protein Interaction Identification (PPI) datasets, viz., AiMed and BioInfer. All these results are also compared with the state-of-the-art models.

引用

页码：3315 / 3325

页数：11

共 45 条

[1] [Anonymous], 2016, BIOMED RES INT
[2] A MULTI-VIEW DEEP LEARNING ARCHITECTURE FOR CLASSIFICATION OF BREAST MICROCALCIFICATIONS
Bekker, Alan Joseph
Greenspan, Hayit
Goldberger, Jacob
[J]. 2016 IEEE 13TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2016, : 726 - 730
[3] Boden M., 2002, AIS8 NUTEK
[4] Comparative experiments on learning information extractors for proteins and their interactions
Bunescu, R
Ge, RF
Kate, RJ
Marcotte, EM
Mooney, RJ
Ramani, AK
Wong, YW
[J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) : 139 - 155
[5] Extraction of protein-protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings
Choi, Sung-Pil
[J]. JOURNAL OF INFORMATION SCIENCE, 2018, 44 (01) : 60 - 73
[6] Chollet F., 2015, KERAS
[7] SUPPORT-VECTOR NETWORKS
CORTES, C
VAPNIK, V
[J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297
[8] Nguyen DQ, 2018, SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2018), P129
[9] A CTD-Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug-disease and drug-phenotype interactions
Davis, Allan Peter
Wiegers, Thomas C.
Roberts, Phoebe M.
King, Benjamin L.
Lay, Jean M.
Lennon-Hopkins, Kelley
Sciaky, Daniela
Johnson, Robin
Keating, Heather
Greene, Nigel
Hernandez, Robert
McConnell, Kevin J.
Enayetallah, Ahmed E.
Mattingly, Carolyn J.
[J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2013,
[10] Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks
Davis, Allan Peter
Murphy, Cynthia G.
Saraceni-Richards, Cynthia A.
Rosenstein, Michael C.
Wiegers, Thomas C.
Mattingly, Carolyn J.
[J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D786 - D792

← 1 2 3 4 5 →