Comparative Evaluation of the Supervised Machine Learning Classification Methods and the Concept Drift Detection Methods in the Financial Business Problems

被引:3
作者
Pugliese, Victor Ulisses [1 ]
Costa, Renato Duarte [1 ]
Hirata, Celso Massaki [1 ]
机构
[1] Inst Tecnol Aeronaut, Praca Marechal Eduardo Gomes 50, Sao Jose Dos Campos, Brazil
来源
ENTERPRISE INFORMATION SYSTEMS, ICEIS 2020 | 2021年 / 417卷
关键词
Supervised learning; Concept drift; Ranking methods; TESTS;
D O I
10.1007/978-3-030-75418-1_13
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning methods are key tools for aiding in the decision making of financial business problems, such as risk analysis, fraud detection, and credit-granting evaluations, reducing the time and effort and increasing accuracy. Supervised machine learning classification methods learn patterns in data to improve prediction. In the long term, the data patterns may change in a process known as concept drift, with the changes requesting retraining the classification methods to maintain their accuracies. We conducted a comparative study using twelve classification methods and seven concept drift detection methods. The evaluated methods are Gaussian and Incremental Naive Bayes, Logistic Regression, Support Vector Classifier, k-Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting, XGBoost, Multilayer Perceptron, Stochastic Gradient Descent, and Hoeffding Tree. The analyzed concept drift detection methods are ADWIN, DDM, EDDM, HDDMa, HDDMw, KSWIN, and Page Hinkley. We used the next-generation hyperparameter optimization framework Optuna and applied the non-parametric Friedman test to infer hypotheses and Nemeyni as a posthoc test to validate the results. We used five datasets in the financial domain. With the performance metrics of F1 and AUROC scores for classification, XGBoost outperformed other methods in the classification experiments. In the data stream experiments with concept drift, using accuracy as performance metrics, Hoeffding Tree and XGBoost showed the best results with the HDDMw, KSWIN, and ADWIN concept drift detection methods. We conclude that XGBoost with HDDMw is the recommended combination when financial datasets that exhibit concept drift.
引用
收藏
页码:268 / 292
页数:25
相关论文
共 42 条
[1]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[2]  
Bache K., 2013, UCI machine learning repository, V28
[3]  
Bifet A, 2009, LECT NOTES COMPUT SC, V5772, P249, DOI 10.1007/978-3-642-03915-7_22
[4]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[5]  
Bouazza Imane, 2019, Advanced Intelligent Systems for Sustainable Development (AI2SD2018). Volume 5: Advanced Intelligent Systems for Computing Sciences. Advances in Intelligent Systems and Computing (AISC 915), P205, DOI 10.1007/978-3-030-11928-7_17
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[8]   Calibrating Probability with Undersampling for Unbalanced Classification [J].
Dal Pozzolo, Andrea ;
Caelen, Olivier ;
Johnson, Reid A. ;
Bontempi, Gianluca .
2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, :159-166
[9]  
Damodaran A., 1996, Corporate finance
[10]  
Dua D., 2017, UCI machine learning repository