Contextual classification of clinical records with bidirectional long short-term memory (Bi-LSTM) and bidirectional encoder representations from transformers (BERT) model

被引:4
作者
Zalte, Jaya [1 ]
Shah, Harshal [1 ]
机构
[1] Parul Univ, Fac Engn & Technol, Vadodara, India
关键词
BERT; Bi-LSTM; EHR; NLP; WORD EMBEDDINGS; ENSEMBLE;
D O I
10.1111/coin.12692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning models have overcome traditional machine learning techniques for text classification domains in the field of natural language processing (NLP). Since, NLP is a branch of machine learning, used for interpreting language, classifying text of interest, and the same can be applied to analyse the medical clinical electronic health records. Medical text consists of lot of rich data which can altogether provide a good insight, by determining patterns from the clinical text data. In this paper, bidirectional-long short-term memory (Bi-LSTM), bi-LSTM attention and bidirectional encoder representations from transformers (BERT) base models are used to classify the text which are of privacy concern to a person and which should be extracted and can be tagged as sensitive. This text data which we might think not of privacy concern would majorly reveal a lot about the patient's integrity and personal life. Clinical data not only have patient demographic data but lot of hidden data which might go unseen and thus could arise privacy issues. Bi-LSTM with attention layer is also added on top to realize the importance of critical words which will be of great importance in terms of classification, we are able to achieve accuracy of about 92%. About 206,926 sentences are used out of which 80% are used for training and rest for testing we get accuracy of 90% approx. with Bi-LSTM alone. The same set of datasets is used for BERT model with accuracy of 93% approx.
引用
收藏
页数:17
相关论文
共 39 条
[1]   Attention in Natural Language Processing [J].
Galassi, Andrea ;
Lippi, Marco ;
Torroni, Paolo .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) :4291-4308
[2]  
Garc¡a-Pablos A, 2020, Arxiv, DOI arXiv:2003.03106
[3]   Extraction and Classification of TCM Medical Records Based on BERT and Bi-LSTM With Attention Mechanism [J].
Hui, Ye ;
Du, Lin ;
Lin, Shuyuan ;
Qu, Yiqian ;
Cao, Dong .
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, :1626-1631
[4]  
Jamaluddin M., 2021, Proceedings of theInternational Seminar on Application for Technology of Information and Communication (iSemantic), P243, DOI [10.1109/iSemantic52711.2021.9573178, DOI 10.1109/ISEMANTIC52711.2021.9573178]
[5]  
Jindal Prateek., 2014, P 5 ACM C BIOINFORMA, P617, DOI [10.1145/2649387.2662451, DOI 10.1145/2649387.2662451]
[6]   Text feature extraction based on deep learning: a review [J].
Liang, Hong ;
Sun, Xiao ;
Sun, Yunlei ;
Gao, Yuan .
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2017,
[7]   Extracting topic-sensitive content from textual documents-A hybrid topic model approach [J].
Liang, Yan ;
Liu, Ying ;
Chen, Chong ;
Jiang, Zhigang .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 70 :81-91
[8]   Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [J].
Liu, Pengfei ;
Yuan, Weizhe ;
Fu, Jinlan ;
Jiang, Zhengbao ;
Hayashi, Hiroaki ;
Neubig, Graham .
ACM COMPUTING SURVEYS, 2023, 55 (09)
[9]   Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words [J].
Mao, Xiangke ;
Huang, Shaobin ;
Li, Rongsheng ;
Shen, Linshan .
IEEE ACCESS, 2020, 8 :117528-117538
[10]   Mining electronic health records: challenges and impact [J].
Menasalvas, Ernestina ;
Rodriguez-Gonzalez, Alejandro ;
Gonzalo, Consuelo .
2018 14TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS), 2018, :747-754