A data-centric machine learning approach to improve prediction of glioma grades using low-imbalance TCGA data

被引:1
|
作者
Sanchez-Marques, Raquel [1 ,2 ]
Garcia, Vicente [3 ]
Sanchez, J. Salvador [4 ]
机构
[1] Fdn Estatal Salud Infancia & Bienestar Social, Madrid 28029, Spain
[2] Ctr Invest Biomed Red Enfermedades Infecciosas CIB, Inst Salud Carlos III, CIBERINFEC, Madrid 28029, Spain
[3] Univ Autonoma Ciudad Juarez, Dept Elect & Comp Engn, Inst Ingn & Tecnol, Ciudad Juarez 32310, Mexico
[4] Univ Jaume 1, Inst New Imaging Technol, Dept Comp Languages & Syst, Castellon de La Plana 12071, Spain
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Data-centric machine learning; Glioma grade; Class imbalance; Feature ranking; Clinical factors; Molecular biomarkers; CLASSIFICATION; RADIOMICS;
D O I
10.1038/s41598-024-68291-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate prediction and grading of gliomas play a crucial role in evaluating brain tumor progression, assessing overall prognosis, and treatment planning. In addition to neuroimaging techniques, identifying molecular biomarkers that can guide the diagnosis, prognosis and prediction of the response to therapy has aroused the interest of researchers in their use together with machine learning and deep learning models. Most of the research in this field has been model-centric, meaning it has been based on finding better performing algorithms. However, in practice, improving data quality can result in a better model. This study investigates a data-centric machine learning approach to determine their potential benefits in predicting glioma grades. We report six performance metrics to provide a complete picture of model performance. Experimental results indicate that standardization and oversizing the minority class increase the prediction performance of four popular machine learning models and two classifier ensembles applied on a low-imbalanced data set consisting of clinical factors and molecular biomarkers. The experiments also show that the two classifier ensembles significantly outperform three of the four standard prediction models. Furthermore, we conduct a comprehensive descriptive analysis of the glioma data set to identify relevant statistical characteristics and discover the most informative attributes using four feature ranking algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Model and data-centric machine learning algorithms to address data scarcity for failure identification
    Khan, Lareb Zar
    Pedro, Joao
    Costa, Nelson
    Sgambelluri, Andrea
    Napoli, Antonio
    Sambo, Nicola
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 369 - 381
  • [22] Data-centric framework for crystal structure identification in atomistic simulations using machine learning
    Chung, Heejung W.
    Freitas, Rodrigo
    Cheon, Gowoon
    Reed, Evan J.
    PHYSICAL REVIEW MATERIALS, 2022, 6 (04)
  • [23] A Data-Centric Approach for Reducing Carbon Emissions in Deep Learning
    Anselmo, Martin
    Vitali, Monica
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2023, 2023, 13901 : 123 - 138
  • [24] Better, Not Just More: Data-centric machine learning for Earth observation
    Roscher, Ribana
    Russwurm, Marc
    Gevaert, Caroline
    Kampffmeyer, Michael
    Dos Santos, Jefersson A.
    Vakalopoulou, Maria
    Haensch, Ronny
    Hansen, Stine
    Nogueira, Keiller
    Prexl, Jonathan
    Tuia, Devis
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2024, 12 (04) : 335 - 355
  • [25] Systematic review of data-centric approaches in artificial intelligence and machine learning
    Singh P.
    Data Science and Management, 2023, 6 (03): : 144 - 157
  • [26] Natural soils' shear strength prediction: A morphological data-centric approach
    Omar, Maher
    Arab, Mohamed G.
    Alotaibi, Emran
    Alshibli, Khalid A.
    Shanableh, Abdallah
    Elmehdi, Hussein
    Malkawi, Dima A. Hussien
    Tahmaz, Ali
    SOILS AND FOUNDATIONS, 2024, 64 (06)
  • [27] A DATA-CENTRIC APPROACH FOR RAPID DATASET GENERATION USING ITERATIVE LEARNING AND SPARSE ANNOTATIONS
    Ferreira de Carvalho, Osmar Luiz
    de Albuquerque, Anesmar Olino
    Luiz, Argelica Saiaka
    Guimaraes Ferreira, Pedro Henrique
    Mou, Lichao
    Guerreiro e Silva, Daniel
    de Carvalho Junior, Osmar Abilio
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5650 - 5653
  • [28] Short-term water demand forecasting using data-centric machine learning approaches
    Liu, Guoxuan
    Savic, Dragan
    Fu, Guangtao
    JOURNAL OF HYDROINFORMATICS, 2023, 25 (03) : 895 - 911
  • [29] Towards Data-Centric What-If Analysis for Native Machine Learning Pipelines
    Grafberger, Stefan
    Groth, Paul
    Schelter, Sebastian
    PROCEEDINGS OF THE 6TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2022, 2022,
  • [30] A Data-Centric Machine Learning Methodology: Application on Predictive Maintenance of Wind Turbines
    Garan, Maryna
    Tidriri, Khaoula
    Kovalenko, Iaroslav
    ENERGIES, 2022, 15 (03)