Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach

被引:19
|
作者
Lindaa, Hammami [1 ]
Alessia, Paglialonga [2 ]
Giancarlo, Pruneri [3 ,4 ]
Michele, Torresani [5 ]
Milenaa, Sant [1 ]
Carlo, Bono [6 ]
Gianluca, Caiani Enrico [2 ,7 ]
Paolo, Baili [1 ]
机构
[1] Fdn IRCCS Ist Nazl Tumori, Analyt Epidemiol & Hlth Impact Unit, Via Venezian 1, I-20133 Milan, Italy
[2] Natl Res Council Italy CNR, Inst Elect Comp & Telecommun Engn IEIIT, Milan, Italy
[3] Fdn IRCCS Ist Nazl Tumori, Pathol Dept, Milan, Italy
[4] Univ Milan, Sch Med, Milan, Italy
[5] Fdn IRCCS Ist Nazl Tumori, Hlth Direct, Milan, Italy
[6] Fdn IRCCS Ist Nazl Tumori, Milan, Italy
[7] Politecn Milan, Elect Informat & Biomed Engn Dept, Milan, Italy
关键词
Natural Language Processing; Italian language; Pathology Reports; Cancer morphology;
D O I
10.1016/j.jbi.2021.103712
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pathology reports represent a primary source of information for cancer registries. Hospitals routinely process high volumes of free-text reports, a valuable source of information regarding cancer diagnosis for improving clinical care and supporting research. Information extraction and coding of textual unstructured data is typically a manual, labour-intensive process. There is a need to develop automated approaches to extract meaningful information from such texts in a reliable and accurate way. In this scenario, Natural Language Processing (NLP) algorithms offer a unique opportunity to automatically encode the unstructured reports into structured data, thus representing a potential powerful alternative to expensive manual processing. However, notwithstanding the increasing interest in this area, there is still limited availability of NLP approaches for pathology reports in languages other than English, including Italian, to date. The aim of our work was to develop an automated algorithm based on NLP techniques, able to identify and classify the morphological content of pathology reports in the Italian language with micro-averaged performance scores higher than 95%. Specifically, a novel, domainspecific classifier that uses linguistic rules was developed and tested on 27,239 pathology reports from a single Italian oncological centre, following the International Classification of Diseases for Oncology morphology classification standard (ICD-O-M). The proposed classification algorithm achieved successful results with a micro-F1 score of 98.14% on 9594 pathology reports in the test dataset. This algorithm relies on rules defined on data from a single hospital that is specifically dedicated to cancer, but it is based on general processing steps which can be applied to different datasets. Further research will be important to demonstrate the generalizability of the proposed approach on a larger corpus from different hospitals.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Automated Detection of Cancer-Suspicious Findings in Japanese Radiology Reports with Natural Language Processing: A Multicenter Study
    Sugimoto, Kento
    Wada, Shoya
    Konishi, Shozo
    Sato, Junya
    Okada, Katsuki
    Kido, Shoji
    Tomiyama, Noriyuki
    Matsumura, Yasushi
    Takeda, Toshihiro
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025,
  • [42] Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach
    Wei-Hung Weng
    Kavishwar B. Wagholikar
    Alexa T. McCray
    Peter Szolovits
    Henry C. Chueh
    BMC Medical Informatics and Decision Making, 17
  • [43] Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach
    Weng, Wei-Hung
    Wagholikar, Kavishwar B.
    McCray, Alexa T.
    Szolovits, Peter
    Chueh, Henry C.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
  • [44] A novel classification approach for Android malware based on feature fusion and natural language processing
    Chen, Jinfu
    Zhao, Zian
    Chen, Xiao
    Cai, Saihua
    Yin, Shang
    Song, Luo
    13TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2022, 2022, : 28 - 36
  • [45] Systematic analysis of constellation-based techniques by using Natural Language Processing
    Perazzoli, Simone
    de Santana Neto, Jose Pedro
    Mathias Barreto de Menezes, Milton Jose
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2022, 179
  • [46] Classification of neurologic outcomes from medical notes using natural language processing
    Fernandes, Marta B.
    Valizadeh, Navid
    Alabsi, Haitham S.
    Quadri, Syed A.
    Tesh, Ryan A.
    Bucklin, Abigail A.
    Sun, Haoqi
    Jain, Aayushee
    Brenner, Laura N.
    Ye, Elissa
    Ge, Wendong
    Collens, Sarah, I
    Lin, Stacie
    Das, Sudeshna
    Robbins, Gregory K.
    Zafar, Sahar F.
    Mukerji, Shibani S.
    Westover, M. Brandon
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [47] Using natural language processing for automated classification of disease and to identify misclassified ICD codes in cardiac disease
    Falter, Maarten
    Godderis, Dries
    Scherrenberg, Martijn
    Kizilkilic, Sevda Ece
    Xu, Linqi
    Mertens, Marc
    Jansen, Jan
    Legroux, Pascal
    Kindermans, Hanne
    Sinnaeve, Peter
    Neven, Frank
    Dendale, Paul
    EUROPEAN HEART JOURNAL - DIGITAL HEALTH, 2024, 5 (03): : 229 - 234
  • [48] Discerning Tumor Status from Unstructured MRI Reports—Completeness of Information in Existing Reports and Utility of Automated Natural Language Processing
    Lionel T. E. Cheng
    Jiaping Zheng
    Guergana K. Savova
    Bradley J. Erickson
    Journal of Digital Imaging, 2010, 23 : 119 - 132
  • [49] Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing
    Hassan, Fahad Ul
    Le, Tuyen
    JOURNAL OF LEGAL AFFAIRS AND DISPUTE RESOLUTION IN ENGINEERING AND CONSTRUCTION, 2020, 12 (02)
  • [50] Automated Extraction of BI-RADS Final Assessment Categories from Radiology Reports with Natural Language Processing
    Sippo, Dorothy A.
    Warden, Graham I.
    Andriole, Katherine P.
    Lacson, Ronilda
    Ikuta, Ichiro
    Birdwell, Robyn L.
    Khorasani, Ramin
    JOURNAL OF DIGITAL IMAGING, 2013, 26 (05) : 989 - 994