Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

被引:3
作者
Chng, Seo Yi [1 ]
Tern, Paul J. W. [2 ]
Kan, Matthew R. X. [3 ]
Cheng, Lionel T. E. [4 ]
机构
[1] Natl Univ Singapore, Dept Paediat, 5 Lower Kent Ridge Rd, Singapore 119074, Singapore
[2] Natl Heart Ctr, Dept Cardiol, Singapore, Singapore
[3] NUS High Sch Math & Sci, Singapore, Singapore
[4] Singapore Gen Hosp, Dept Diagnost Radiol, Singapore, Singapore
来源
HEALTH CARE SCIENCE | 2023年 / 2卷 / 02期
基金
英国科研创新办公室;
关键词
automated labelling; machine learning; natural language processing; neural network; radiology;
D O I
10.1002/hcs2.40
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules-based text-matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules-based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short-term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer-based model that utilizes attention. Pretrained BERT models only require fine-tuning with small data sets. In particular, domain-specific BERT models can achieve superior performance compared with the other methods for automated labelling. There are four main methods employed in the automated labelling of radiology reports, namely: (1) rules-based text-matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. image
引用
收藏
页码:120 / 128
页数:9
相关论文
共 40 条
  • [1] Identification of asthma control factor in clinical notes using a hybrid deep learning model
    Agnikula Kshatriya, Bhavani Singh
    Sagheb, Elham
    Wi, Chung-Il
    Yoon, Jungwon
    Seol, Hee Yun
    Juhn, Young
    Sohn, Sunghwan
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 7)
  • [2] Impact of word embedding models on text analytics in deep learning environment: a review
    Asudani, Deepak Suresh
    Nagwani, Naresh Kumar
    Singh, Pradeep
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 10345 - 10425
  • [3] Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment
    Banerjee, Imon
    Li, Kevin
    Seneviratne, Martin
    Ferrari, Michelle
    Seto, Tina
    Brooks, James D.
    Rubin, Daniel L.
    Hernandez-Boussard, Tina
    [J]. JAMIA OPEN, 2019, 2 (01) : 150 - 159
  • [4] Bao QM, 2020, PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2020)
  • [5] Comprehensive comparative study of multi-label classification methods
    Bogatinovski, Jasmin
    Todorovski, Ljupco
    Dzeroski, Saso
    Kocev, Dragi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
  • [6] Breiman L., 2001, MACH LEARN, V45, P5
  • [7] Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports
    Bressem, Keno K.
    Adams, Lisa C.
    Gaudin, Robert A.
    Troeltzsch, Daniel
    Hamm, Bernd
    Makowski, Marcus R.
    Schuele, Chan-Yong
    Vahldiek, Janis L.
    Niehues, Stefan M.
    [J]. BIOINFORMATICS, 2020, 36 (21) : 5255 - 5261
  • [8] Chapman WW, 2001, J AM MED INFORM ASSN, P105
  • [9] Domain specific word embeddings for natural language processing in radiology
    Chen, Timothy L.
    Emerling, Max
    Chaudhari, Gunvant R.
    Chillakuru, Yeshwant R.
    Seo, Youngho
    Vu, Thienkhai H.
    Sohn, Jae Ho
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113
  • [10] Demner-Fushman Dina, 2012, Journal of Computing Science and Engineering, V6, P168, DOI 10.5626/JCSE.2012.6.2.168