Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach

被引:3
作者
Viscosi, Carmelo [1 ]
Fidelbo, Paolo [1 ]
Benedetto, Andrea [1 ]
Varvara, Massimo [1 ]
Ferrante, Margherita [1 ]
机构
[1] Azienda Osped Univ Policlin G Rodolico San Marco, Registro Tumori Integrato Catania Messina Enna, UOC Igiene, Dipartimento GF Ingrassia, Via S Sofia 87, I-95123 Catania, Italy
关键词
Machine learning; Binary classification; Natural language processing; Cancer registry; AUTOMATED CLASSIFICATION; PATHOLOGY;
D O I
10.1016/j.ijmedinf.2022.104714
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Histopathology reports are a primary data source for the case definition phase of a Cancer Registry. By reading the histopathology report, the operator that evaluates an oncology case can define the morphology and topography of cancer, and validate the case with the highest diagnosis base. The key problem of the Catania-Messina-Enna Integrated Cancer Registry (RTI) is that these reports are written in natural language and relevant information for cancer evaluation is only a little part of the total annual histopathological reports. In this population-based retrospective cohort study, we try to optimize the working time spent by the RTI operators in seeking and selecting the right information among the histopathology reports in the east Sicily population, by developing a binary classifier on a training set of labeled historical data and validating its outcome by a test set of labeled data created by the operators during the years. Using a machine learning algorithm we built a classification model that evaluates each free text report and returns a score that indicates the probability that it contains oncologic relevant information. The best performing algorithm, among the eight analyzed in this study, was the LightGBM that reached an F1Score of 98.9%. Using the chosen classifier we shortened the time for case evaluation, improving the timeliness of cancer statistics.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches
    Kasthurirathne, Suranga Nath
    Dixon, Brian E.
    Grannis, Shaun J.
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1070 - 1070
  • [22] Gene selection from microarray data for cancer classification - a machine learning approach
    Wang, Y
    Tetko, IV
    Hall, MA
    Frank, E
    Facius, A
    Mayer, KFX
    Mewes, HW
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2005, 29 (01) : 37 - 46
  • [23] Feature selection for chillers fault diagnosis from the perspectives of machine learning and field application
    Wang, Zhanwei
    Guo, Jingjing
    Xia, Penghua
    Wang, Lin
    Zhang, Chunxiao
    Leng, Qiang
    Zheng, Kaixin
    ENERGY AND BUILDINGS, 2024, 307
  • [24] A machine learning approach to extracting spatial information from geological texts in Chinese
    Chu, Deping
    Wan, Bo
    Li, Hong
    Dong, Shuai
    Fu, Jinming
    Liu, Yiyang
    Huang, Kuan
    Liu, Hui
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2022, 36 (11) : 2169 - 2193
  • [25] Wind farm sites selection using a machine learning approach and geographical information systems in Türkiye
    Khalaf, Oras Fadhil
    Ucan, Osman Nuri
    Alsamarai, Naseem Adnan
    DISCOVER COMPUTING, 2025, 28 (01)
  • [26] Machine learning approach for automatic lungs sound diagnosis from pulmonary signals
    Rani, Shikha
    Chaurasia, Anushka
    Dutta, Malay Kishore
    Myska, Vojtech
    Burget, Radim
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 366 - 371
  • [27] Identifying Potential miRNA Biomarkers for Gastric Cancer Diagnosis Using Machine Learning Variable Selection Approach
    Gilani, Neda
    Arabi Belaghi, Reza
    Aftabi, Younes
    Faramarzi, Elnaz
    Edguenlue, Tuba
    Somi, Mohammad Hossein
    FRONTIERS IN GENETICS, 2022, 12
  • [28] A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
    Achilonu, Okechinyere J.
    Olago, Victor
    Singh, Elvira
    Eijkemans, Rene M. J. C.
    Nimako, Gideon
    Musenge, Eustasius
    INFORMATION, 2021, 12 (11)
  • [29] Predicting PTSD Severity in Veterans from Self-reports for Early Intervention: A Machine Learning Approach
    Annapureddy, Priyanka
    Hossain, Md Fitrat
    Kissane, Thomas
    Frydrychowicz, Wylie
    Nitu, Paromita
    Coelho, Joseph
    Johnson, Nadiyah
    Madiraju, Praveen
    Franco, Zeno
    Hooyer, Katinka
    Jain, Niharika
    Flower, Mark
    Ahamed, Sheikh
    2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 201 - 208
  • [30] Kidney Cancer Diagnosis and Surgery Selection by Machine Learning from CT Scans Combined with Clinical Metadata
    Mahmud, Sakib
    Abbas, Tariq O.
    Mushtak, Adam
    Prithula, Johayra
    Chowdhury, Muhammad E. H.
    CANCERS, 2023, 15 (12)