A data-centric machine learning approach to improve prediction of glioma grades using low-imbalance TCGA data

被引:1
|
作者
Sanchez-Marques, Raquel [1 ,2 ]
Garcia, Vicente [3 ]
Sanchez, J. Salvador [4 ]
机构
[1] Fdn Estatal Salud Infancia & Bienestar Social, Madrid 28029, Spain
[2] Ctr Invest Biomed Red Enfermedades Infecciosas CIB, Inst Salud Carlos III, CIBERINFEC, Madrid 28029, Spain
[3] Univ Autonoma Ciudad Juarez, Dept Elect & Comp Engn, Inst Ingn & Tecnol, Ciudad Juarez 32310, Mexico
[4] Univ Jaume 1, Inst New Imaging Technol, Dept Comp Languages & Syst, Castellon de La Plana 12071, Spain
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Data-centric machine learning; Glioma grade; Class imbalance; Feature ranking; Clinical factors; Molecular biomarkers; CLASSIFICATION; RADIOMICS;
D O I
10.1038/s41598-024-68291-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate prediction and grading of gliomas play a crucial role in evaluating brain tumor progression, assessing overall prognosis, and treatment planning. In addition to neuroimaging techniques, identifying molecular biomarkers that can guide the diagnosis, prognosis and prediction of the response to therapy has aroused the interest of researchers in their use together with machine learning and deep learning models. Most of the research in this field has been model-centric, meaning it has been based on finding better performing algorithms. However, in practice, improving data quality can result in a better model. This study investigates a data-centric machine learning approach to determine their potential benefits in predicting glioma grades. We report six performance metrics to provide a complete picture of model performance. Experimental results indicate that standardization and oversizing the minority class increase the prediction performance of four popular machine learning models and two classifier ensembles applied on a low-imbalanced data set consisting of clinical factors and molecular biomarkers. The experiments also show that the two classifier ensembles significantly outperform three of the four standard prediction models. Furthermore, we conduct a comprehensive descriptive analysis of the glioma data set to identify relevant statistical characteristics and discover the most informative attributes using four feature ranking algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] RDET stacking classifier: a novel machine learning based approach for stroke prediction using imbalance data
    Rehman, Amjad
    Alam, Teg
    Mujahid, Muhammad
    Alamri, Faten S.
    Al Ghofaily, Bayan
    Saba, Tanzila
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [32] DCServCG: A data-centric service code generation using deep learning
    Alizadehsani, Zakieh
    Ghaemi, Hadi
    Shahraki, Amin
    Gonzalez-Briones, Alfonso
    Corchado, Juan M.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [33] Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators
    Chatarasi, Prasanth
    Kwon, Hyoukjun
    Parashar, Angshuman
    Pellauer, Michael
    Krishna, Tushar
    Sarkar, Vivek
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (01)
  • [34] Machine Learning Approach to Improve Satellite Orbit Prediction Accuracy Using Publicly Available Data
    Hao Peng
    Xiaoli Bai
    The Journal of the Astronautical Sciences, 2020, 67 : 762 - 793
  • [35] Machine Learning Approach to Improve Satellite Orbit Prediction Accuracy Using Publicly Available Data
    Peng, Hao
    Bai, Xiaoli
    JOURNAL OF THE ASTRONAUTICAL SCIENCES, 2020, 67 (02): : 762 - 793
  • [36] A Data-Centric Approach for Analyzing Large-Scale Deep Learning Applications
    Vineet, S. Sai
    Joseph, Natasha Meena
    Korgaonkar, Kunal
    Paul, Arnab K.
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, ICDCN 2023, 2023, : 282 - 283
  • [37] A Machine-Learning-Based Data-Centric Misbehavior Detection Model for Internet of Vehicles
    Sharma, Prinkle
    Liu, Hong
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (06) : 4991 - 4999
  • [38] Data-centric Engineering: integrating simulation, machine learning and statistics. Challenges and opportunities
    Pan, Indranil
    Mason, Lachlan R.
    Matar, Omar K.
    Chemical Engineering Science, 2022, 249
  • [39] What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health
    Emmert-Streib, Frank
    Yli-Harja, Olli
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (21)
  • [40] Data-centric Engineering: integrating simulation, machine learning and statistics. Challenges and opportunities
    Pan, Indranil
    Mason, Lachlan R.
    Matar, Omar K.
    CHEMICAL ENGINEERING SCIENCE, 2022, 249