Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning

被引:3
作者
Li, Jiaxi [1 ]
Wang, Zhelong [1 ]
Wu, Lina [2 ]
Qiu, Sen [1 ]
Zhao, Hongyu [1 ]
Lin, Fang [1 ]
Zhang, Ke [1 ]
机构
[1] Dalian Univ Technol, Sch Control Sci & Engn, Dalian 116024, Peoples R China
[2] Liaoning Canc Hosp & Inst, Shenyang 110042, Peoples R China
关键词
Training; Mathematical models; Ensemble learning; Task analysis; Costs; Support vector machines; Data models; Data incompleteness; class imbalance; physical fitness assessment; malignant tumor patients; multivariate imputation by chained equations; ensemble learning; MISSING DATA IMPUTATION; MULTIPLE IMPUTATION; PREHABILITATION; FRAMEWORK; HEALTH;
D O I
10.1109/JBHI.2024.3376428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.
引用
收藏
页码:3102 / 3113
页数:12
相关论文
共 47 条
  • [1] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
  • [2] [Anonymous], 2017, P INT C COMP COMM TE
  • [3] Imputation of missing data with class imbalance using conditional generative adversarial networks
    Awan, Saqib Ejaz
    Bennamoun, Mohammed
    Sohel, Ferdous
    Sanfilippo, Frank
    Dwivedi, Girish
    [J]. NEUROCOMPUTING, 2021, 453 : 164 - 171
  • [4] Teaching of Independent Exercises for Prehabilitation in Breast Cancer
    Baima, Jennifer
    Reynolds, Sara-Grace
    Edmiston, Kathryn
    Larkin, Anne
    Ward, B. Marie
    O'Connor, Ashling
    [J]. JOURNAL OF CANCER EDUCATION, 2017, 32 (02) : 252 - 256
  • [5] A Keyword-Enhanced Approach to Handle Class Imbalance in Clinical Text Classification
    Blanchard, Andrew E.
    Gao, Shang
    Yoon, Hong-Jun
    Christian, J. Blair
    Durbin, Eric B.
    Wu, Xiao-Cheng
    Stroup, Antoinette
    Doherty, Jennifer
    Schwartz, Stephen M.
    Wiggins, Charles
    Coyle, Linda
    Penberthy, Lynne
    Tourassi, Georgia D.
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (06) : 2796 - 2803
  • [6] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [7] Framework PEACE: An organizational model for examining physical exercise across the cancer experience
    Courneya, KS
    Friedenreich, CM
    [J]. ANNALS OF BEHAVIORAL MEDICINE, 2001, 23 (04) : 263 - 272
  • [8] Multiobjective Support Vector Machines: Handling Class Imbalance With Pareto Optimality
    Datta, Shounak
    Das, Swagatam
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1602 - 1608
  • [9] Dietz J H Jr, 1980, Curr Probl Cancer, V5, P1
  • [10] Generative adversarial networks for imputing missing data for big data clinical research
    Dong, Weinan
    Fong, Daniel Yee Tak
    Yoon, Jin-sun
    Wan, Eric Yuk Fai
    Bedford, Laura Elizabeth
    Tang, Eric Ho Man
    Lam, Cindy Lo Kuen
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)