Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach

被引:3
作者
Viscosi, Carmelo [1 ]
Fidelbo, Paolo [1 ]
Benedetto, Andrea [1 ]
Varvara, Massimo [1 ]
Ferrante, Margherita [1 ]
机构
[1] Azienda Osped Univ Policlin G Rodolico San Marco, Registro Tumori Integrato Catania Messina Enna, UOC Igiene, Dipartimento GF Ingrassia, Via S Sofia 87, I-95123 Catania, Italy
关键词
Machine learning; Binary classification; Natural language processing; Cancer registry; AUTOMATED CLASSIFICATION; PATHOLOGY;
D O I
10.1016/j.ijmedinf.2022.104714
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Histopathology reports are a primary data source for the case definition phase of a Cancer Registry. By reading the histopathology report, the operator that evaluates an oncology case can define the morphology and topography of cancer, and validate the case with the highest diagnosis base. The key problem of the Catania-Messina-Enna Integrated Cancer Registry (RTI) is that these reports are written in natural language and relevant information for cancer evaluation is only a little part of the total annual histopathological reports. In this population-based retrospective cohort study, we try to optimize the working time spent by the RTI operators in seeking and selecting the right information among the histopathology reports in the east Sicily population, by developing a binary classifier on a training set of labeled historical data and validating its outcome by a test set of labeled data created by the operators during the years. Using a machine learning algorithm we built a classification model that evaluates each free text report and returns a score that indicates the probability that it contains oncologic relevant information. The best performing algorithm, among the eight analyzed in this study, was the LightGBM that reached an F1Score of 98.9%. Using the chosen classifier we shortened the time for case evaluation, improving the timeliness of cancer statistics.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach
    Olago, Victor
    Muchengeti, Mazvita
    Singh, Elvira
    Chen, Wenlong C.
    INFORMATION, 2020, 11 (09)
  • [2] Machine Learning-Based Extraction of Breast Cancer Receptor Status From Bilingual Free-Text Pathology Reports
    Pironet, Antoine
    Poirel, Helene A.
    Tambuyzer, Tim
    De Schutter, Harlinde
    van Walle, Lien
    Mattheijssens, Joris
    Henau, Kris
    Van Eycken, Liesbet
    Van Damme, Nancy
    FRONTIERS IN DIGITAL HEALTH, 2021, 3
  • [4] Comparison of machine learning classifiers for influenza detection from emergency department free-text reports
    Pineda, Arturo Lopez
    Ye, Ye
    Visweswaran, Shyam
    Cooper, Gregory F.
    Wagner, Michael M.
    Tsui, Fuchiang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 60 - 69
  • [5] Toward Complete Structured Information Extraction from Radiology Reports Using Machine Learning
    Jackson M. Steinkamp
    Charles Chambers
    Darco Lalevic
    Hanna M. Zafar
    Tessa S. Cook
    Journal of Digital Imaging, 2019, 32 : 554 - 564
  • [6] Toward Complete Structured Information Extraction from Radiology Reports Using Machine Learning
    Steinkamp, Jackson M.
    Chambers, Charles
    Lalevic, Darco
    Zafar, Hanna M.
    Cook, Tessa S.
    JOURNAL OF DIGITAL IMAGING, 2019, 32 (04) : 554 - 564
  • [7] Toward Extracting Information from Public Health Statutes using Text Classification and Machine Learning
    Grabmair, Matthias
    Ashley, Kevin D.
    Hwa, Rebecca
    Sweeney, Patricia M.
    Legal Knowledge and Information Systems, 2011, 235 : 73 - 82
  • [8] Predicting applicable law sections from judicial case reports using legislative text analysis with machine learning
    Souvik Sengupta
    Vishwang Dave
    Journal of Computational Social Science, 2022, 5 : 503 - 516
  • [9] Predicting applicable law sections from judicial case reports using legislative text analysis with machine learning
    Sengupta, Souvik
    Dave, Vishwang
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2022, 5 (01): : 503 - 516
  • [10] Personality Classification from Online Text using Machine Learning Approach
    Khan, Alam Sher
    Ahmad, Hussain
    Asghar, Muhammad Zubair
    Saddozai, Furcian Khan
    Arir, Areeba
    Khalid, Hassan Ali
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (03) : 460 - 476