A data-centric machine learning approach to improve prediction of glioma grades using low-imbalance TCGA data

被引:1
|
作者
Sanchez-Marques, Raquel [1 ,2 ]
Garcia, Vicente [3 ]
Sanchez, J. Salvador [4 ]
机构
[1] Fdn Estatal Salud Infancia & Bienestar Social, Madrid 28029, Spain
[2] Ctr Invest Biomed Red Enfermedades Infecciosas CIB, Inst Salud Carlos III, CIBERINFEC, Madrid 28029, Spain
[3] Univ Autonoma Ciudad Juarez, Dept Elect & Comp Engn, Inst Ingn & Tecnol, Ciudad Juarez 32310, Mexico
[4] Univ Jaume 1, Inst New Imaging Technol, Dept Comp Languages & Syst, Castellon de La Plana 12071, Spain
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Data-centric machine learning; Glioma grade; Class imbalance; Feature ranking; Clinical factors; Molecular biomarkers; CLASSIFICATION; RADIOMICS;
D O I
10.1038/s41598-024-68291-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate prediction and grading of gliomas play a crucial role in evaluating brain tumor progression, assessing overall prognosis, and treatment planning. In addition to neuroimaging techniques, identifying molecular biomarkers that can guide the diagnosis, prognosis and prediction of the response to therapy has aroused the interest of researchers in their use together with machine learning and deep learning models. Most of the research in this field has been model-centric, meaning it has been based on finding better performing algorithms. However, in practice, improving data quality can result in a better model. This study investigates a data-centric machine learning approach to determine their potential benefits in predicting glioma grades. We report six performance metrics to provide a complete picture of model performance. Experimental results indicate that standardization and oversizing the minority class increase the prediction performance of four popular machine learning models and two classifier ensembles applied on a low-imbalanced data set consisting of clinical factors and molecular biomarkers. The experiments also show that the two classifier ensembles significantly outperform three of the four standard prediction models. Furthermore, we conduct a comprehensive descriptive analysis of the glioma data set to identify relevant statistical characteristics and discover the most informative attributes using four feature ranking algorithms.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis
    Westermann, Hannes
    Savelka, Jaromir
    Walker, Vern R.
    Ashley, Kevin D.
    Benyekhlef, Karim
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 346 : 54 - 57
  • [42] Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases
    Oliveira, Geraldo F.
    Boroumand, Amirali
    Ghose, Saugata
    Gomez-Luna, Juan
    Mutlu, Onur
    2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 273 - 278
  • [43] Improving Color Mixture Predictions in Ceramics using Data-centric Deep Learning
    Souper, Tomas
    Morgado, Ana C.
    Marques, Ana
    Silva, Ines
    Rosado, Luis
    PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2023, 2023, : 221 - 229
  • [44] A DATA-CENTRIC APPROACH TO UNSUPERVISED TEXTURE SEGMENTATION USING PRINCIPLE REPRESENTATIVE PATTERNS
    Zhang, Kaitai
    Chen, Hong-Shuo
    Zhang, Xinfeng
    Wang, Ye
    Kuo, C. -C. Jay
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1912 - 1916
  • [45] Machine Learning Approach to Improve Data Connectivity in Text-based Personality Prediction using Multiple Data Sources Mapping
    Johnson, Sirasapalli Joshua
    Murty, M. Ramakrishna
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2023, 82 (01): : 109 - 119
  • [46] A Data-Centric Approach for Pectoral Muscle Deep Learning Segmentation Enhancements in Mammography Images
    Silva, Santiago V.
    Sierra-Franco, Cesar A.
    Hurtado, Jan
    da Cruz, Leonardo C.
    Thomaz, Victor de A.
    Silva-Calpa, Greis Francy M.
    Raposo, Alberto B.
    ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT I, 2023, 14361 : 56 - 67
  • [47] Integrating Curriculum Learning With k-Means: A Data-Centric Approach to Faster Clustering
    Majeed, Abdul
    Hwang, Seong Oun
    IT PROFESSIONAL, 2024, 26 (05) : 36 - 46
  • [48] Quantum Processing in Fusion of SAR and Optical Images for Deep Learning: A Data-Centric Approach
    Majji, Sathwik Reddy
    Chalumuri, Avinash
    Kune, Raghavendra
    Manoj, B. S.
    IEEE ACCESS, 2022, 10 : 73743 - 73757
  • [49] Smart cities: the role of Internet of Things and machine learning in realizing a data-centric smart environment
    Ullah, Amin
    Anwar, Syed Myhammad
    Li, Jianqiang
    Nadeem, Lubna
    Mahmood, Tariq
    Rehman, Amjad
    Saba, Tanzila
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (01) : 1607 - 1637
  • [50] Smart cities: the role of Internet of Things and machine learning in realizing a data-centric smart environment
    Amin Ullah
    Syed Myhammad Anwar
    Jianqiang Li
    Lubna Nadeem
    Tariq Mahmood
    Amjad Rehman
    Tanzila Saba
    Complex & Intelligent Systems, 2024, 10 : 1607 - 1637