Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach

被引:3
作者
Viscosi, Carmelo [1 ]
Fidelbo, Paolo [1 ]
Benedetto, Andrea [1 ]
Varvara, Massimo [1 ]
Ferrante, Margherita [1 ]
机构
[1] Azienda Osped Univ Policlin G Rodolico San Marco, Registro Tumori Integrato Catania Messina Enna, UOC Igiene, Dipartimento GF Ingrassia, Via S Sofia 87, I-95123 Catania, Italy
关键词
Machine learning; Binary classification; Natural language processing; Cancer registry; AUTOMATED CLASSIFICATION; PATHOLOGY;
D O I
10.1016/j.ijmedinf.2022.104714
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Histopathology reports are a primary data source for the case definition phase of a Cancer Registry. By reading the histopathology report, the operator that evaluates an oncology case can define the morphology and topography of cancer, and validate the case with the highest diagnosis base. The key problem of the Catania-Messina-Enna Integrated Cancer Registry (RTI) is that these reports are written in natural language and relevant information for cancer evaluation is only a little part of the total annual histopathological reports. In this population-based retrospective cohort study, we try to optimize the working time spent by the RTI operators in seeking and selecting the right information among the histopathology reports in the east Sicily population, by developing a binary classifier on a training set of labeled historical data and validating its outcome by a test set of labeled data created by the operators during the years. Using a machine learning algorithm we built a classification model that evaluates each free text report and returns a score that indicates the probability that it contains oncologic relevant information. The best performing algorithm, among the eight analyzed in this study, was the LightGBM that reached an F1Score of 98.9%. Using the chosen classifier we shortened the time for case evaluation, improving the timeliness of cancer statistics.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Large language model-based information extraction from free-text radiology reports: a scoping review protocol
    Reichenpfader, Daniel
    Muller, Henning
    Denecke, Kerstin
    BMJ OPEN, 2023, 13 (12):
  • [32] Extracting Clinical Information From Japanese Radiology Reports Using a 2-Stage Deep Learning Approach: Algorithm Development and Validation
    Sugimoto, Kento
    Wada, Shoya
    Konishi, Shozo
    Okada, Katsuki
    Manabe, Shirou
    Matsumura, Yasushi
    Takeda, Toshihiro
    JMIR MEDICAL INFORMATICS, 2023, 11
  • [33] Chi2-MI: A hybrid feature selection based machine learning approach in diagnosis of chronic kidney disease
    Dey, Samrat Kumar
    Uddin, Khandaker Mohammad Mohi
    Babu, Hafiz Md. Hasan
    Rahman, Md. Mahbubur
    Howlader, Arpita
    Uddin, K. M. Aslam
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2022, 16
  • [34] A MACHINE-LEARNING APPROACH FOR WORKFLOW IDENTIFICATION FROM LOW-LEVEL MONITORING INFORMATION
    Stein, Thorsten
    Stynes, Jeanne
    Kroeger, Reinhold
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON INTERNET TECHNOLOGIES AND APPLICATIONS (ITA 11), 2011, : 62 - 69
  • [35] Diagnosis of pes planus from X-ray images: Enhanced feature selection with deep learning and machine learning techniques
    Danaci, Cagla
    Avci, Derya
    Tuncer, Seda Arslan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 106
  • [36] Diagnostic Performance Evaluation of Deep Learning-Based Medical Text Modelling to Predict Pulmonary Diseases from Unstructured Radiology Free-Text Reports
    Shetty, Shashank
    Ananthanarayana, V. S.
    Mahale, Ajit
    ACTA INFORMATICA PRAGENSIA, 2023, 12 (02) : 260 - 274
  • [37] Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
    Alawad, Mohammed
    Gao, Shang
    Qiu, John X.
    Yoon, Hong Jun
    Christian, J. Blair
    Penberthy, Lynne
    Mumphrey, Brent
    Wu, Xiao-Cheng
    Coyle, Linda
    Tourassi, Georgia
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (01) : 89 - 98
  • [38] Additional Value From Free-Text Diagnoses in Electronic Health Records: Hybrid Dictionary and Machine Learning Classification Study
    Mehra, Tarun
    Wekhof, Tobias
    Keller, Dagmar Iris
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [39] From the Semantic Point Cloud to Heritage-Building Information Modeling: A Semiautomatic Approach Exploiting Machine Learning
    Croce, Valeria
    Caroti, Gabriella
    De Luca, Livio
    Jacquot, Kevin
    Piemonte, Andrea
    Veron, Philippe
    REMOTE SENSING, 2021, 13 (03) : 1 - 34
  • [40] Selection of target-binding proteins from the information of weakly enriched phage display libraries by deep sequencing and machine learning
    Ito, Tomoyuki
    Nguyen, Thuy Duong
    Saito, Yutaka
    Kurumida, Yoichi
    Nakazawa, Hikaru
    Kawada, Sakiya
    Nishi, Hafumi
    Tsuda, Koji
    Kameda, Tomoshi
    Umetsu, Mitsuo
    MABS, 2023, 15 (01)