Ensemble learning method for classification: Integrating data envelopment analysis with machine learning

被引:1
作者
An, Qingxian [1 ,2 ]
Huang, Siwei [1 ]
Han, Yuxuan [1 ]
Zhu, You [3 ,4 ]
机构
[1] Cent South Univ, Sch Business, Changsha 410083, Peoples R China
[2] Hefei Univ Technol, Sch Econ, Hefei 230601, Peoples R China
[3] Hunan Univ, Business Sch, Changsha 410082, Peoples R China
[4] Hunan Prov Key Lab Philosophy & Social Sci Ind Dig, Changsha 410082, Peoples R China
基金
中国国家自然科学基金;
关键词
Ensemble learning; Data envelopment analysis; Classifier; Large dataset; STATISTICAL COMPARISONS; CLASSIFIERS; EFFICIENCY; DEA;
D O I
10.1016/j.cor.2024.106739
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In classification tasks with large sample sets, the use of a single classifier carries the risk of overfitting. To overcome this issue, an ensemble of classifier models has often been shown to outperform the use of a single "best" model. Given the rich variety of classifier models available, the selection of the high-efficiency classifiers for a given task dataset remains an urgent challenge. However, most of the previous classifier selection methods only focus on the measurement of classification output performance without considering the computational cost. This paper proposes a new ensemble learning method to improve the classification quality for big datasets by using data envelopment analysis. It contains the following two stages: classifier selection and classifier combination. In the first stage, the commonly used classifiers are evaluated on the basis of their performance on resource consumption and classification output performance using the range directional model (RDM); then, the most efficient classifiers are selected. In the second stage, the classifier confusion matrix is evaluated using the data envelopment analysis (DEA) cross-efficiency model. Then, the weight for the classifier combination is determined to ensure that classifiers with higher performance have greater weights based on the cross-efficiency values. Experimental results demonstrate the superiority of the cross-efficiency model over the BCC model and the benchmark voting method in model ensemble. Furthermore, our method has been shown to save more computational resources and yields better results than existing methods.
引用
收藏
页数:17
相关论文
共 50 条
[1]   Prediction of daily global solar radiation using different machine learning algorithms: Evaluation and comparison [J].
Agbulut, Umit ;
Gurel, Ali Etem ;
Bicen, Yunus .
RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2021, 135
[2]   Using diversity of errors for selecting members of a committee classifier [J].
Aksela, M ;
Laaksonen, J .
PATTERN RECOGNITION, 2006, 39 (04) :608-623
[3]   Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text [J].
Al-Azani, Sadam ;
El-Alfy, El-Sayed M. .
8TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2017) AND THE 7TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT 2017), 2017, 109 :359-366
[4]   Sustainable residential building energy consumption forecasting for smart cities using optimal weighted voting ensemble learning [J].
Alymani, Mofadal ;
Mengash, Hanan Abdullah ;
Aljebreen, Mohammed ;
Alasmari, Naif ;
Allafi, Randa ;
Alshahrani, Hussain ;
Elfaki, Mohamed Ahmed ;
Hamza, Manar Ahmed ;
Abdelmageed, Amgad Atta .
SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2023, 57
[5]   Ensemble framework for causality learning with heterogeneous Directed Acyclic Graphs through the lens of optimization [J].
Aslani, Babak ;
Mohebbi, Shima .
COMPUTERS & OPERATIONS RESEARCH, 2023, 152
[6]   SOME MODELS FOR ESTIMATING TECHNICAL AND SCALE INEFFICIENCIES IN DATA ENVELOPMENT ANALYSIS [J].
BANKER, RD ;
CHARNES, A ;
COOPER, WW .
MANAGEMENT SCIENCE, 1984, 30 (09) :1078-1092
[7]   Evaluation and selection of clustering methods using a hybrid group MCDM [J].
Barak, Sasan ;
Mokfi, Taha .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 138
[8]   Identifying Soccer Players on Facebook Through Predictive Analytics [J].
Bogaert, Matthias ;
Ballings, Michel ;
Hosten, Martijn ;
Van den Poel, Dirk .
DECISION ANALYSIS, 2017, 14 (04) :274-297
[9]   Benefit and distance functions [J].
Chambers, RG ;
Chung, YH ;
Fare, R .
JOURNAL OF ECONOMIC THEORY, 1996, 70 (02) :407-419
[10]   Profit, directional distance functions, and Nerlovian efficiency [J].
Chambers, RG ;
Chung, Y ;
Fare, R .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1998, 98 (02) :351-364