Handling Class Imbalance In Direct Marketing Dataset Using A Hybrid Data and Algorithmic Level Solutions

被引:0
作者
Alhakbani, Haya Abdullah [1 ]
al-Rifaie, Mohammad Majid [1 ]
机构
[1] Univ London, Goldsmiths Coll, Dept Comp, London SE14 6NW, England
来源
PROCEEDINGS OF THE 2016 SAI COMPUTING CONFERENCE (SAI) | 2016年
关键词
Imbalance data; minority class; grid search; sampling; SMOTE; classification; SVM;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Class imbalance is a major problem in machine learning. It occurs when the number of instances in the majority class is significantly more than the number of instances in the minority class. This is a common problem which is recurring in most datasets, including the one used in this paper (i.e. direct marketing dataset). In direct marketing, businesses are interested in identifying potential buyers, or charities wish to identify potential givers. Several solutions have been suggested in the literature to address this problem, amongst which are data-level techniques, algorithmic-level techniques and a combination of both. In this paper, a model is proposed to solve imbalanced data using a Hybrid of Data-level and Algorithmic-level solutions (HybridDA), which involves oversampling the minority class, undersampling the majority class, and additionally, optimising the cost parameter, the gamma and the kernel type of Support Vector Machines (SVM) using a grid search. The proposed model perfomed competitively compared with other models on the same dataset. The dataset used in this work are real-world data collected from a Portuguese marketing campaign for bank-deposit subscriptions and are available from the University of California, Irvine (UCI) Machine Learning Repository.
引用
收藏
页码:446 / 451
页数:6
相关论文
共 33 条
  • [1] Akbani R, 2004, EUR C MACH LEARN
  • [2] [Anonymous], 2012, International Journal of Emerging Technology and Advanced Engineering, DOI DOI 10.46338/IJETAE0412_13
  • [3] [Anonymous], 1997, P 14 INT C ONMACHINE
  • [4] Bahnsen A. C., 2015, ARXIV150504637Y
  • [5] Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
  • [6] Batuwita R., 2012, IMBALANCED LEARNING
  • [7] Handling class imbalance in customer churn prediction
    Burez, J.
    Van den Poel, D.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 4626 - 4636
  • [8] Chawla NV, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P853, DOI 10.1007/0-387-25465-X_40
  • [9] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [10] Using evolutionary sampling to mine imbalanced data
    Drown, Dennis J.
    Khoshgoftaar, Taghi M.
    Narayanan, Rarnaswarny
    [J]. ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 363 - 368