A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection

被引:0
作者
Chaofan Yang
Guanjun Liu
Chungang Yan
Changjun Jiang
机构
[1] Tongji University,Department of Computer Science
[2] Tongji University,Shanghai Electronic Transactions and Information Service Collaborative Innovation Center
来源
Science China Information Sciences | 2021年 / 64卷
关键词
ensemble learning; AdaBoost; clustering; misclassification degree; transaction fraud detection;
D O I
暂无
中图分类号
学科分类号
摘要
AdaBoost is a famous ensemble learning method and has achieved successful applications in many fields. The existing studies illustrate that AdaBoost easily suffers from noisy points, resulting in a decline of classification performance. The main reason is that it increases the weights of all misclassified samples (especially noisy points) in the same way so that the influence of noisy points can hardly be weakened. In this paper, the clustering algorithm is used to dynamically decide noisy points in the process of iterations. More precisely, we compute a misclassification degree for every cluster in every iteration that is used to decide if a misclassified sample is a noisy point or not in the current iteration. Furthermore, we propose a flexible method to update the weights of the misclassified samples. The experimental results on 22 public datasets show that our method achieves better results than the state-of-the-art methods including AdaBoost, AdaCoast, LogitBoost, and SPLBoost. We also apply our method to the transactions fraud detection, and the experiments on our real big dataset of transactions also illustrate its good performance.
引用
收藏
相关论文
共 25 条
[1]  
Yu B(2008)A comparative study for content-based dynamic spam classification using four machine learning algorithms Know-Based Syst 24 355-362
[2]  
Xu Z B(2001)A hybrid high-order Markov chain model for computer intrusion detection J Comput Graph Stat 10 277-295
[3]  
Ju W H(2009)Credit card fraud detection: a fusion approach using Dempster-Shafer theory and Bayesian learning Inf Fusion 10 354-363
[4]  
Vardi Y(1994)C4.5: programs for machine learning Mach Learn 16 235-240
[5]  
Panigrahi S(1995)Support vector network Mach Learn 20 273-297
[6]  
Kundu A(1997)A decision-theoretic generalization of on-line learning and an application to boosting J Comput Syst Sci 55 119-139
[7]  
Sural S(1996)Bagging predictors Mach Learn 24 123-140
[8]  
Salzberg S L(2000)Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors) Ann Statist 28 337-407
[9]  
Cortes C(1979)Algorithm AS 136: a k-means clustering algorithm J R Stat Soc 28 100-108
[10]  
Vapnik V(2001)Random forests Mach Learn 45 5-32