Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach

被引:19
|
作者
Lindaa, Hammami [1 ]
Alessia, Paglialonga [2 ]
Giancarlo, Pruneri [3 ,4 ]
Michele, Torresani [5 ]
Milenaa, Sant [1 ]
Carlo, Bono [6 ]
Gianluca, Caiani Enrico [2 ,7 ]
Paolo, Baili [1 ]
机构
[1] Fdn IRCCS Ist Nazl Tumori, Analyt Epidemiol & Hlth Impact Unit, Via Venezian 1, I-20133 Milan, Italy
[2] Natl Res Council Italy CNR, Inst Elect Comp & Telecommun Engn IEIIT, Milan, Italy
[3] Fdn IRCCS Ist Nazl Tumori, Pathol Dept, Milan, Italy
[4] Univ Milan, Sch Med, Milan, Italy
[5] Fdn IRCCS Ist Nazl Tumori, Hlth Direct, Milan, Italy
[6] Fdn IRCCS Ist Nazl Tumori, Milan, Italy
[7] Politecn Milan, Elect Informat & Biomed Engn Dept, Milan, Italy
关键词
Natural Language Processing; Italian language; Pathology Reports; Cancer morphology;
D O I
10.1016/j.jbi.2021.103712
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pathology reports represent a primary source of information for cancer registries. Hospitals routinely process high volumes of free-text reports, a valuable source of information regarding cancer diagnosis for improving clinical care and supporting research. Information extraction and coding of textual unstructured data is typically a manual, labour-intensive process. There is a need to develop automated approaches to extract meaningful information from such texts in a reliable and accurate way. In this scenario, Natural Language Processing (NLP) algorithms offer a unique opportunity to automatically encode the unstructured reports into structured data, thus representing a potential powerful alternative to expensive manual processing. However, notwithstanding the increasing interest in this area, there is still limited availability of NLP approaches for pathology reports in languages other than English, including Italian, to date. The aim of our work was to develop an automated algorithm based on NLP techniques, able to identify and classify the morphological content of pathology reports in the Italian language with micro-averaged performance scores higher than 95%. Specifically, a novel, domainspecific classifier that uses linguistic rules was developed and tested on 27,239 pathology reports from a single Italian oncological centre, following the International Classification of Diseases for Oncology morphology classification standard (ICD-O-M). The proposed classification algorithm achieved successful results with a micro-F1 score of 98.14% on 9594 pathology reports in the test dataset. This algorithm relies on rules defined on data from a single hospital that is specifically dedicated to cancer, but it is based on general processing steps which can be applied to different datasets. Further research will be important to demonstrate the generalizability of the proposed approach on a larger corpus from different hospitals.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Natural Language Processing Methods and Techniques for Knowledge Extraction from School Reports
    Venturi, Giulia
    Dell'Orletta, Felice
    Montemagni, Simonetta
    Morini, Elettra
    Sagri, Maria Teresa
    CADMO, 2020, (02): : 49 - +
  • [22] A Methodological Approach to Validate Pneumonia Encounters from Radiology Reports Using Natural Language Processing
    Panny, AlokSagar
    Hegde, Harshad
    Glurich, Ingrid
    Scannapieco, Frank A.
    Vedre, Jayanth G.
    VanWormer, Jeffrey J.
    Miecznikowski, Jeffrey
    Acharya, Amit
    METHODS OF INFORMATION IN MEDICINE, 2022, 61 (01/02) : 38 - 45
  • [23] A framework based on Natural Language Processing and Machine Learning for the classification of the severity of road accidents from reports
    Valcamonico, Dario
    Baraldi, Piero
    Amigoni, Francesco
    Zio, Enrico
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART O-JOURNAL OF RISK AND RELIABILITY, 2024, 238 (05) : 903 - +
  • [24] Resume Classification System using Natural Language Processing and Machine Learning Techniques
    Ali, Irfan
    Mughal, Nimra
    Khand, Zahid Hussain
    Ahmed, Javed
    Mujtaba, Ghulam
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2022, 41 (01) : 65 - 79
  • [25] A rule-based grapheme-phone converter and stress determination for Brazilian Portuguese natural language processing
    Silva, Denilson C.
    de Lima, Amaro A.
    Maia, R.
    Braga, Daniela
    de Moraes, Joao F.
    de Moraes, Joao A.
    Resende, Fernando G. V., Jr.
    PROCEEDINGS OF THE IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 2006, : 550 - +
  • [26] An Integrated Approach to Spam Classification on Twitter Using URL Analysis, Natural Language Processing and Machine Learning Techniques
    Kandasamy, Kamalanathan
    Koroth, Preethi
    2014 IEEE STUDENTS' CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER SCIENCE (SCEECS), 2014,
  • [27] Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm
    Bozkurt, Selen
    Alkim, Emel
    Banerjee, Imon
    Rubin, Daniel L.
    JOURNAL OF DIGITAL IMAGING, 2019, 32 (04) : 544 - 553
  • [28] Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods
    Chng, Seo Yi
    Tern, Paul J. W.
    Kan, Matthew R. X.
    Cheng, Lionel T. E.
    HEALTH CARE SCIENCE, 2023, 2 (02): : 120 - 128
  • [29] Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm
    Selen Bozkurt
    Emel Alkim
    Imon Banerjee
    Daniel L. Rubin
    Journal of Digital Imaging, 2019, 32 : 544 - 553
  • [30] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    HEALTH AND TECHNOLOGY, 2020, 10 (06) : 1555 - 1570