Undersampling Instance Selection for Hybrid and Incomplete Imbalanced Data

被引:0
作者
Camacho-Nieto, Oscar [1 ]
Yanez-Marquez, Cornelio [2 ]
Villuendas-Rey, Yenny [1 ]
机构
[1] Inst Politecn Nacl, CIDETEC, Cdmx, Mexico
[2] Inst Politecn Nacl, CIC, Cdmx, Mexico
关键词
undersampling; imbalanced data; hybrid and incomplete data; SOFTWARE TOOL; DATA-SETS; CLASSIFICATION; ALGORITHMS; ENSEMBLES; KEEL;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper proposes a novel undersampling method, for dealing with imbalanced datasets. The proposal is based on a novel instance importance measure (also introduced in this paper), and is able to balance hybrid and incomplete data. The numerical experiments carried out show the proposed undersampling algorithm outperforms others algorithms of the state of art, in well-known imbalanced datasets.
引用
收藏
页码:698 / 719
页数:22
相关论文
共 42 条
  • [1] KEEL: a software tool to assess evolutionary algorithms for data mining problems
    Alcala-Fdez, J.
    Sanchez, L.
    Garcia, S.
    del Jesus, M. J.
    Ventura, S.
    Garrell, J. M.
    Otero, J.
    Romero, C.
    Bacardit, J.
    Rivas, V. M.
    Fernandez, J. C.
    Herrera, F.
    [J]. SOFT COMPUTING, 2009, 13 (03) : 307 - 318
  • [2] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
  • [3] [Anonymous], 2005, 5 INT C HYBR INT SYS
  • [4] Bagby SP, 2019, AM J PUBLIC HEALTH, V109, pS56, DOI [10.2105/ajph.2018.304864, 10.2105/AJPH.2018.304864]
  • [5] Batista G.E., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
  • [6] Robust multiple-instance learning ensembles using random subspace instance selection
    Carbonneau, Marc-Andre
    Granger, Eric
    Raymond, Alexandre J.
    Gagnon, Ghyslain
    [J]. PATTERN RECOGNITION, 2016, 58 : 83 - 99
  • [7] Oversampling imbalanced data in the string space
    Castellanos, Francisco J.
    Valero-Mas, Jose J.
    Calvo-Zaragoza, Jorge
    Rico-Juan, Juan R.
    [J]. PATTERN RECOGNITION LETTERS, 2018, 103 : 32 - 38
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [9] Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data
    Cheng, K. O.
    Law, N. F.
    Siu, W. C.
    [J]. PATTERN RECOGNITION, 2012, 45 (04) : 1281 - 1289
  • [10] Financial distress prediction using the hybrid associative memory with translation
    Cleofas-Sanchez, L.
    Garcia, V.
    Marques, A. I.
    Sanchez, J. S.
    [J]. APPLIED SOFT COMPUTING, 2016, 44 : 144 - 152