Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning

被引:2
|
作者
Li, Jiaxi [1 ]
Wang, Zhelong [1 ]
Wu, Lina [2 ]
Qiu, Sen [1 ]
Zhao, Hongyu [1 ]
Lin, Fang [1 ]
Zhang, Ke [1 ]
机构
[1] Dalian Univ Technol, Sch Control Sci & Engn, Dalian 116024, Peoples R China
[2] Liaoning Canc Hosp & Inst, Shenyang 110042, Peoples R China
关键词
Training; Mathematical models; Ensemble learning; Task analysis; Costs; Support vector machines; Data models; Data incompleteness; class imbalance; physical fitness assessment; malignant tumor patients; multivariate imputation by chained equations; ensemble learning; MISSING DATA IMPUTATION; MULTIPLE IMPUTATION; PREHABILITATION; FRAMEWORK; HEALTH;
D O I
10.1109/JBHI.2024.3376428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.
引用
收藏
页码:3102 / 3113
页数:12
相关论文
共 50 条
  • [41] Ensemble learning based predictive modelling on a highly imbalanced multiclass data
    Vasti, Manka
    Dev, Amita
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (08): : 2141 - 2164
  • [42] A Genetic-Based Ensemble Learning Applied to Imbalanced Data Classification
    Klikowski, Jakub
    Ksieniewicz, Pawel
    Wozniak, Michal
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2019), PT II, 2019, 11872 : 340 - 352
  • [43] Data integration of National Dose Registry and survey data using multivariate imputation by chained equations (vol 17, e0261534, 2022)
    Kim, Ryu Kyung
    Kim, Young Min
    Lee, Won Jin
    Im, Jongho
    Lee, Juhee
    Bang, Ye Jin
    Cha, Eun Shil
    PLOS ONE, 2022, 17 (08):
  • [44] An effective ensemble method for missing data imputation
    Baruah, Bikash
    Dutta, Manash P.
    Bhattacharyya, Dhruba K.
    INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTER SECURITY, 2023, 20 (3-4) : 295 - 314
  • [45] Multiple imputation using chained equations for missing data in TIMSS: a case study
    Bouhlila D.S.
    Sellaouti F.
    Large-scale Assessments in Education, 1 (1)
  • [46] CHOOSING AMONG IMPUTATION TECHNIQUES FOR INCOMPLETE MULTIVARIATE DATA - A SIMULATION STUDY
    BELLO, AL
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (03) : 853 - 877
  • [47] An Imbalanced Multi-Label Data Ensemble Learning Method Based on Safe Under-Sampling
    Sun, Zhong-Bin
    Diao, Yu-Xuan
    Ma, Su-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (10): : 3392 - 3408
  • [48] Ensemble Learning Method for Large-Scale Power Transformer Status Evaluation Based on Imbalanced Data
    Han X.
    Wang X.
    Han S.
    Zhang Y.
    Wang J.
    Dianwang Jishu/Power System Technology, 2021, 45 (01): : 107 - 114
  • [49] A novel ensemble method for classifying imbalanced data
    Sun, Zhongbin
    Song, Qinbao
    Zhu, Xiaoyan
    Sun, Heli
    Xu, Baowen
    Zhou, Yuming
    PATTERN RECOGNITION, 2015, 48 (05) : 1623 - 1637
  • [50] Adaptive Ensemble Method Based on Spatial Characteristics for Classifying Imbalanced Data
    Wang, Lei
    Zhao, Lei
    Gui, Guan
    Zheng, Baoyu
    Huang, Ruochen
    SCIENTIFIC PROGRAMMING, 2017, 2017