Selection of diagnosis with oncologic relevance information from histopathology free text reports: A machine learning approach

被引:3
作者
Viscosi, Carmelo [1 ]
Fidelbo, Paolo [1 ]
Benedetto, Andrea [1 ]
Varvara, Massimo [1 ]
Ferrante, Margherita [1 ]
机构
[1] Azienda Osped Univ Policlin G Rodolico San Marco, Registro Tumori Integrato Catania Messina Enna, UOC Igiene, Dipartimento GF Ingrassia, Via S Sofia 87, I-95123 Catania, Italy
关键词
Machine learning; Binary classification; Natural language processing; Cancer registry; AUTOMATED CLASSIFICATION; PATHOLOGY;
D O I
10.1016/j.ijmedinf.2022.104714
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Histopathology reports are a primary data source for the case definition phase of a Cancer Registry. By reading the histopathology report, the operator that evaluates an oncology case can define the morphology and topography of cancer, and validate the case with the highest diagnosis base. The key problem of the Catania-Messina-Enna Integrated Cancer Registry (RTI) is that these reports are written in natural language and relevant information for cancer evaluation is only a little part of the total annual histopathological reports. In this population-based retrospective cohort study, we try to optimize the working time spent by the RTI operators in seeking and selecting the right information among the histopathology reports in the east Sicily population, by developing a binary classifier on a training set of labeled historical data and validating its outcome by a test set of labeled data created by the operators during the years. Using a machine learning algorithm we built a classification model that evaluates each free text report and returns a score that indicates the probability that it contains oncologic relevant information. The best performing algorithm, among the eight analyzed in this study, was the LightGBM that reached an F1Score of 98.9%. Using the chosen classifier we shortened the time for case evaluation, improving the timeliness of cancer statistics.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] An Adversorial Approach to Enable Re-Use of Machine Learning Models and Collaborative Research Efforts Using Synthetic Unstructured Free-Text Medical Data
    Kasthurirathne, Suranga N.
    Dexter, Gregory
    Grannis, Shaun J.
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 1510 - 1511
  • [42] Predicting post-contrast information from contrast agent free cardiac MRI using machine learning: Challenges and methods
    Abdulkareem, Musa
    Kenawy, Asmaa A.
    Rauseo, Elisa
    Lee, Aaron M.
    Sojoudi, Alireza
    Amir-Khalili, Alborz
    Lekadir, Karim
    Young, Alistair A.
    Barnes, Michael R.
    Barckow, Philipp
    Khanji, Mohammed Y.
    Aung, Nay
    Petersen, Steffen E.
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
  • [43] An Efficient Approach to Predict Eye Diseases from Symptoms Using Machine Learning and Ranker-Based Feature Selection Methods
    Al Marouf, Ahmed
    Mottalib, Md Mozaharul
    Alhajj, Reda
    Rokne, Jon
    Jafarullah, Omar
    BIOENGINEERING-BASEL, 2023, 10 (01):
  • [44] Machine learning-based approach for efficient prediction of diagnosis, prognosis and lymph node metastasis of papillary thyroid carcinoma using adhesion signature selection
    Sun, Shuo
    Cai, Xiaoni
    Shao, Jinhai
    Zhang, Guimei
    Liu, Shan
    Wang, Hongsheng
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (12) : 20599 - 20623
  • [45] Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning-Based Text-Mining Approach
    Azzolina, Danila
    Bressan, Silvia
    Lorenzoni, Giulia
    Baldan, Giulia Andrea
    Bartolotta, Patrizia
    Scognamiglio, Federico
    Francavilla, Andrea
    Lanera, Corrado
    Da Dalt, Liviana
    Gregori, Dario
    JMIR PUBLIC HEALTH AND SURVEILLANCE, 2023, 9
  • [46] Enteric Methane Emission in Livestock Sector: Bibliometric Research from 1986 to 2024 with Text Mining and Topic Analysis Approach by Machine Learning Algorithms
    Evangelista, Chiara
    Milanesi, Marco
    Pietrucci, Daniele
    Chillemi, Giovanni
    Bernabucci, Umberto
    ANIMALS, 2024, 14 (21):
  • [47] Using Text Content From Coronary Catheterization Reports to Predict 5-Year Mortality Among Patients Undergoing Coronary Angiography: A Deep Learning Approach
    Li, Yu-Hsuan
    Lee, I-Te
    Chen, Yu-Wei
    Lin, Yow-Kuan
    Liu, Yu-Hsin
    Lai, Fei-Pei
    FRONTIERS IN CARDIOVASCULAR MEDICINE, 2022, 9
  • [48] Diagnosis of major depressive disorder by combining multimodal information from heart rate dynamics and serum proteomics using machine-learning algorithm
    Kim, Eun Young
    Lee, Min Young
    Kim, Se Hyun
    Ha, Kyooseob
    Kim, Kwang Pyo
    Ahn, Yong Min
    PROGRESS IN NEURO-PSYCHOPHARMACOLOGY & BIOLOGICAL PSYCHIATRY, 2017, 76 : 65 - 71
  • [49] Algebraic Bayesian Networks: Naive Frequentist Approach to Local Machine Learning Based on Imperfect Information from Social Media and Expert Estimates
    Kharitonov, Nikita A.
    Maximov, Anatoly G.
    Tulupyev, Alexander L.
    ARTIFICIAL INTELLIGENCE: (RCAI 2019), 2019, 1093 : 234 - 244
  • [50] Classifying Personality Traits from Text Data: A Machine Learning Approach Using Stochastic Gradient Descent for Simplified Jungian Typology-Based Assessment Tool
    Muliawati, Tri Hadiah
    Swandaru, Tina Rumy
    Kusumaningtyas, Entin Martiana
    Bimantoko, Iqbal
    2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024, 2024, : 528 - 533