A data-centric machine learning approach to improve prediction of glioma grades using low-imbalance TCGA data

被引:1
|
作者
Sanchez-Marques, Raquel [1 ,2 ]
Garcia, Vicente [3 ]
Sanchez, J. Salvador [4 ]
机构
[1] Fdn Estatal Salud Infancia & Bienestar Social, Madrid 28029, Spain
[2] Ctr Invest Biomed Red Enfermedades Infecciosas CIB, Inst Salud Carlos III, CIBERINFEC, Madrid 28029, Spain
[3] Univ Autonoma Ciudad Juarez, Dept Elect & Comp Engn, Inst Ingn & Tecnol, Ciudad Juarez 32310, Mexico
[4] Univ Jaume 1, Inst New Imaging Technol, Dept Comp Languages & Syst, Castellon de La Plana 12071, Spain
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Data-centric machine learning; Glioma grade; Class imbalance; Feature ranking; Clinical factors; Molecular biomarkers; CLASSIFICATION; RADIOMICS;
D O I
10.1038/s41598-024-68291-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate prediction and grading of gliomas play a crucial role in evaluating brain tumor progression, assessing overall prognosis, and treatment planning. In addition to neuroimaging techniques, identifying molecular biomarkers that can guide the diagnosis, prognosis and prediction of the response to therapy has aroused the interest of researchers in their use together with machine learning and deep learning models. Most of the research in this field has been model-centric, meaning it has been based on finding better performing algorithms. However, in practice, improving data quality can result in a better model. This study investigates a data-centric machine learning approach to determine their potential benefits in predicting glioma grades. We report six performance metrics to provide a complete picture of model performance. Experimental results indicate that standardization and oversizing the minority class increase the prediction performance of four popular machine learning models and two classifier ensembles applied on a low-imbalanced data set consisting of clinical factors and molecular biomarkers. The experiments also show that the two classifier ensembles significantly outperform three of the four standard prediction models. Furthermore, we conduct a comprehensive descriptive analysis of the glioma data set to identify relevant statistical characteristics and discover the most informative attributes using four feature ranking algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Data-centric approach to improve machine learning models for inorganic materials
    Bartel, Christopher J.
    PATTERNS, 2021, 2 (11):
  • [2] A Data-Centric Approach to improve performance of deep learning models
    Bhatt, Nikita
    Bhatt, Nirav
    Prajapati, Purvi
    Sorathiya, Vishal
    Alshathri, Samah
    El-Shafai, Walid
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [3] A Data-Centric Approach to Generate Invariants for a Smart Grid Using Machine Learning
    Hudani, Danish
    Haseeb, Muhammad
    Taufiq, Muhammad
    Umer, Muhammad Azmi
    Kandasamy, Nandha Kumar
    SAT-CPS'22: PROCEEDINGS OF THE 2022 ACM WORKSHOP ON SECURE AND TRUSTWORTHY CYBER-PHYSICAL SYSTEMS, 2022, : 31 - 36
  • [4] Machine learning for data-centric epidemic forecasting
    Rodriguez, Alexander
    Kamarthi, Harshavardhan
    Agarwal, Pulak
    Ho, Javen
    Patel, Mira
    Sapre, Suchet
    Prakash, B. Aditya
    NATURE MACHINE INTELLIGENCE, 2024, 6 (10) : 1122 - 1131
  • [5] A Data-Centric Optimization Framework for Machine Learning
    Rausch, Oliver
    Ben-Nun, Tal
    Dryden, Nikoli
    Ivanov, Andrei
    Li, Shigang
    Hoefler, Torsten
    PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
  • [6] Machine Learning for Failure Management in Microwave Networks: A Data-Centric Approach
    Di Cicco, Nicola
    Ibrahimi, Memedhe
    Musumeci, Francesco
    Bruschetta, Federica
    Milano, Michele
    Passera, Claudio
    Tornatore, Massimo
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (05): : 5420 - 5431
  • [7] Data-centric machine learning in quantum information science
    Lohani, Sanjaya
    Lukens, Joseph M.
    Glasser, Ryan T.
    Searles, Thomas A.
    Kirby, Brian T.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2022, 3 (04):
  • [8] Data-Centric Approaches to Radio Frequency Machine Learning
    Kuzdeba, Scott
    Robinson, Josh
    2022 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2022,
  • [9] A data-centric perspective to fair machine learning for healthcare
    Zhang, Haoran
    Gerych, Walter
    Ghassemi, Marzyeh
    NATURE REVIEWS METHODS PRIMERS, 2024, 4 (01):
  • [10] Data-Centric Machine Learning in Nursing: A Concept Clarification
    Ball Dunlap, Patricia A.
    Nahm, Eun-Shim
    Umberfield, Elizabeth E.
    CIN-COMPUTERS INFORMATICS NURSING, 2024, 42 (05) : 325 - 333