Chemical-protein Interaction Extraction via ChemicalBERT and Attention Guided Graph Convolutional Networks in Parallel

被引:6
作者
Qin, Lei [1 ]
Dong, Gaocai [1 ]
Peng, Jing [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE | 2020年
基金
中国国家自然科学基金;
关键词
chemical-protein interaction; ChemicalBERT graph convolutional network; parallel model; biomedical relation extraction; DRUG-DRUG INTERACTIONS;
D O I
10.1109/BIBM49941.2020.9313234
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Automated recognition of functional interactions between compounds and proteins/genes from biomedical literature is essential for drug discovery, knowledge understanding, and basic clinical research. Although several computational methods have achieved competitive performances in extracting these relations, there is significant room for improvement in fully capturing complex semantic and syntactic information within sentences. We herein present a novel parallel model to improve chemical-protein interaction (CPI) extraction. Specifically, the model consists of ChemicalBERT and Attention Guided Graph Convolutional Networks (AGGCN) two parallel components. We pre-train BERT on large-scale chemical interaction corpora and re-define it as ChemicalBERT to generate high-quality contextual representation, and employ AGGCN to capture syntactic graph information of the sentence. Finally, the contextual representation and syntactic graph representation are merged into a fusion layer and then fed into the fully-connected softmax layer to extract CPIs. We evaluate our proposed model on the ChemProt corpus, which is the benchmark corpus of this domain. We achieve state-of-the-art results for the CPI extraction with a micro-averaged F1-score of 80.21%. To further demonstrate the efficacy of the proposed model, we have also conducted experiments on the DDIExtraction 2013 corpus and obtained a micro-averaged F1-score of 82.88%, which is also the highest score compared to the existing models. Experimental results show that our proposed model can adequately capture semantic and syntactic information by parallelly extracting sentence features from different views. The code is available at https://github.com/qlbio/CPR extraction.
引用
收藏
页码:708 / 715
页数:8
相关论文
共 32 条
[1]   Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings [J].
Corbett, P. ;
Boyle, J. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]  
Gregoire F., 2018, COLING 2018, P1442
[4]  
Guo ZJ, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P241
[5]   The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions [J].
Herrero-Zazo, Maria ;
Segura-Bedmar, Isabel ;
Martinez, Paloma ;
Declerck, Thierry .
JOURNAL OF BIOMEDICAL INFORMATICS, 2013, 46 (05) :914-920
[6]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[7]   Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach [J].
Kim, Sun ;
Liu, Haibin ;
Yeganova, Lana ;
Wilbur, W. John .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 :23-30
[8]  
Krallinger Martin, 2017, Proceedings of the sixth BioCreative challenge evaluation workshop, V1, P141
[9]   BioBERT: a pre-trained biomedical language representation model for biomedical text mining [J].
Lee, Jinhyuk ;
Yoon, Wonjin ;
Kim, Sungdong ;
Kim, Donghyeon ;
Kim, Sunkyu ;
So, Chan Ho ;
Kang, Jaewoo .
BIOINFORMATICS, 2020, 36 (04) :1234-1240
[10]  
Li Diya, 2019, P 10 INT WORKSH HLTH, P28, DOI DOI 10.18653/V1/D19-6204