A Syntactic Information-Based Classification Model for Medical Literature: Algorithm Development and Validation Study

被引:1
作者
Tang, Wentai [1 ]
Wang, Jian [1 ,2 ]
Lin, Hongfei [1 ]
Zhao, Di [1 ]
Xu, Bo [1 ]
Zhang, Yijia [1 ]
Yang, Zhihao [1 ]
机构
[1] Dalian Univ Technol, Coll Comp Sci & Technol, Dalian, Peoples R China
[2] Dalian Univ Technol, Coll Comp Sci & Technol, 2 Linggong Rd, Dalian 116023, Peoples R China
关键词
medical relation extraction; syntactic features; pruning method; neural networks; medical literature; medical text; extraction; syntactic; classification; interaction; text; literature; semantic;
D O I
10.2196/37817
中图分类号
R-058 [];
学科分类号
摘要
Background: The ever-increasing volume of medical literature necessitates the classification of medical literature. Medical relation extraction is a typical method of classifying a large volume of medical literature. With the development of arithmetic power, medical relation extraction models have evolved from rule-based models to neural network models. The single neural network model discards the shallow syntactic information while discarding the traditional rules. Therefore, we propose a syntactic information-based classification model that complements and equalizes syntactic information to enhance the model.Objective: We aim to complete a syntactic information-based relation extraction model for more efficient medical literature classification.Methods: We devised 2 methods for enhancing syntactic information in the model. First, we introduced shallow syntactic information into the convolutional neural network to enhance nonlocal syntactic interactions. Second, we devise a cross-domain pruning method to equalize local and nonlocal syntactic interactions.Results: We experimented with 3 data sets related to the classification of medical literature. The F1 values were 65.5% and 91.5% on the BioCreative ViCPR (CPR) and Phenotype-Gene Relationship data sets, respectively, and the accuracy was 88.7% on the PubMed data set. Our model outperforms the current state-of-the-art baseline model in the experiments.Conclusions: Our model based on syntactic information effectively enhances medical relation extraction. Furthermore, the results of the experiments show that shallow syntactic information helps obtain nonlocal interaction in sentences and effectively reinforces syntactic features. It also provides new ideas for future research directions.(JMIR Med Inform 2022;10(8):e37817) doi: 10.2196/37817
引用
收藏
页数:10
相关论文
共 20 条
[1]  
[Anonymous], 2018, 2018 C EMPIRICAL MET, DOI DOI 10.18653/V1/D18-1246
[2]  
Dozat Timothy, 2017, Deep biaffine attention for neural dependency parsing
[3]  
Guo ZJ, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3651
[4]  
Guo ZJ, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P241
[5]  
Hale J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2727
[6]  
Heeman PA, 1997, 5 EUR C SPEECH COMM
[7]  
Jin LF, 2020, AAAI CONF ARTIF INTE, V34, P8034
[8]  
Kocaman V, PREPRINT, DOI [10.1002/pip.3189, 10.1002/esp.5134EarthSurf]
[9]   BO-LSTM: classifying relations via long short-term memory networks along biomedical ontologies [J].
Lamurias, Andre ;
Sousa, Diana ;
Clarke, Luka A. ;
Couto, Francisco M. .
BMC BIOINFORMATICS, 2019, 20 (1)
[10]   BioBERT: a pre-trained biomedical language representation model for biomedical text mining [J].
Lee, Jinhyuk ;
Yoon, Wonjin ;
Kim, Sungdong ;
Kim, Donghyeon ;
Kim, Sunkyu ;
So, Chan Ho ;
Kang, Jaewoo .
BIOINFORMATICS, 2020, 36 (04) :1234-1240