Natural Language Processing in Dutch Free Text Radiology Reports: Challenges in a Small Language Area Staging Pulmonary Oncology

被引:18
作者
Nobel, J. Martijn [1 ,2 ]
Puts, Sander [3 ]
Bakers, Frans C. H. [1 ]
Robben, Simon G. F. [1 ,2 ]
Dekker, Andre L. A. J. [3 ]
机构
[1] Maastricht Univ, Med Ctr, Dept Radiol & Nucl Med, Postbox 5800, NL-6202 AZ Maastricht, Netherlands
[2] Maastricht Univ, Sch Hlth Profess Educ, Maastricht, Netherlands
[3] Maastricht Univ, Med Ctr, GROW Sch Oncol & Dev Biol, Dept Radiat Oncol MAASTRO, Maastricht, Netherlands
关键词
Radiology; Reporting; Natural language processing; Free text; Classification system; Machine learning; CLASSIFICATION;
D O I
10.1007/s10278-020-00327-z
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Reports are the standard way of communication between the radiologist and the referring clinician. Efforts are made to improve this communication by, for instance, introducing standardization and structured reporting. Natural Language Processing (NLP) is another promising tool which can improve and enhance the radiological report by processing free text. NLP as such adds structure to the report and exposes the information, which in turn can be used for further analysis. This paper describes pre-processing and processing steps and highlights important challenges to overcome in order to successfully implement a free text mining algorithm using NLP tools and machine learning in a small language area, like Dutch. A rule-based algorithm was constructed to classify T-stage of pulmonary oncology from the original free text radiological report, based on the items tumor size, presence and involvement according to the 8th TNM classification system. PyContextNLP, spaCy and regular expressions were used as tools to extract the correct information and process the free text. Overall accuracy of the algorithm for evaluating T-stage was 0,83 in the training set and 0,87 in the validation set, which shows that the approach in this pilot study is promising. Future research with larger datasets and external validation is needed to be able to introduce more machine learning approaches and perhaps to reduce required input efforts of domain-specific knowledge. However, a hybrid NLP approach will probably achieve the best results.
引用
收藏
页码:1002 / 1008
页数:7
相关论文
共 50 条
  • [41] Extraction of Disease Symptoms from Free Text Using Natural Language Processing Techniques
    Laabidi, Adil
    Aissaoui, Mohammed
    Madani, Mohamed Amine
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 2, ICICT 2024, 2024, 1012 : 549 - 561
  • [42] Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing
    Yu, Sheng
    Kumamaru, Kanako K.
    George, Elizabeth
    Dunne, Ruth M.
    Bedayat, Arash
    Neykov, Matey
    Hunsaker, Andetta R.
    Dill, Karin E.
    Cai, Tianxi
    Rybicki, Frank J.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 52 : 386 - 393
  • [43] Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing
    Callen, Andrew L.
    Dupont, Sara M.
    Price, Adi
    Laguna, Ben
    McCoy, David
    Do, Bao
    Talbott, Jason
    Kohli, Marc
    Narvid, Jared
    JOURNAL OF DIGITAL IMAGING, 2020, 33 (05) : 1194 - 1201
  • [44] Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing
    Saeed Hassanpour
    Graham Bay
    Curtis P. Langlotz
    Journal of Digital Imaging, 2017, 30 : 314 - 322
  • [45] Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing
    Hassanpour, Saeed
    Bay, Graham
    Langlotz, Curtis P.
    JOURNAL OF DIGITAL IMAGING, 2017, 30 (03) : 314 - 322
  • [46] The reporting quality of natural language processing studies: systematic review of studies of radiology reports
    Emma M. Davidson
    Michael T. C. Poon
    Arlene Casey
    Andreas Grivas
    Daniel Duma
    Hang Dong
    Víctor Suárez-Paniagua
    Claire Grover
    Richard Tobin
    Heather Whalley
    Honghan Wu
    Beatrice Alex
    William Whiteley
    BMC Medical Imaging, 21
  • [47] Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing
    Andrew L. Callen
    Sara M. Dupont
    Adi Price
    Ben Laguna
    David McCoy
    Bao Do
    Jason Talbott
    Marc Kohli
    Jared Narvid
    Journal of Digital Imaging, 2020, 33 : 1194 - 1201
  • [48] Natural language processing of radiology reports for identification of skeletal site-specific fractures
    Yanshan Wang
    Saeed Mehrabi
    Sunghwan Sohn
    Elizabeth J. Atkinson
    Shreyasee Amin
    Hongfang Liu
    BMC Medical Informatics and Decision Making, 19
  • [49] Evaluation of Document-Level Identification of Pulmonary Nodules in Radiology Reports Using FLAIR Natural Language Processing Framework
    Oian, Ray
    Fu, Sunyang
    Liu, Hongfang
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 515 - 516
  • [50] The reporting quality of natural language processing studies: systematic review of studies of radiology reports
    Davidson, Emma M.
    Poon, Michael T. C.
    Casey, Arlene
    Grivas, Andreas
    Duma, Daniel
    Dong, Hang
    Suarez-Paniagua, Victor
    Grover, Claire
    Tobin, Richard
    Whalley, Heather
    Wu, Honghan
    Alex, Beatrice
    Whiteley, William
    BMC MEDICAL IMAGING, 2021, 21 (01)