An Integrated machine learning and DEA-predefined performance outcome prediction framework with high-dimensional imbalanced data

被引:3
作者
Shi, Yu [1 ,3 ]
Zhao, Wei [2 ]
机构
[1] Drake Univ, Coll Business & Publ Adm, Des Moines, IA USA
[2] Worcester Polytech Inst, Dept Biomed Engn, Worcester, MA USA
[3] Drake Univ, Coll Business & Publ Adm, Des Moines, IA 50311 USA
关键词
Data envelopment analysis; machine learning; feature selection; performance evaluation; contextual variables; DATA ENVELOPMENT ANALYSIS; BANK BRANCH EFFICIENCY; CREDIT-RISK; BANKRUPTCY PREDICTION; OPERATING EFFICIENCY; FINANCIAL RATIOS; SMOTE; OUTLIERS; MODEL;
D O I
10.1080/03155986.2023.2168943
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In performance evaluation, emerging studies utilize machine learning to increase the interpretability and robustness of data envelopment analysis (DEA), a non-parametric tool for assessing the relative performance of decision-making units (DMUs). In these studies, the machine learning dynamics typically do not replicate the DEA process in terms of directly labeling DMUs based on their relative performance. Practically, there is no standardized methodological framework that serves this purpose. We propose a data-driven and computationally efficient system that imitates DEA and predicts performance outcomes, which are grouped into several classes. First, a DEA composite index was constructed, and the subsequent DEA scores were labeled as the good, the acceptable, and the underperforming classes. Next, synthetic minority oversampling technique (SMOTE) with Manhattan distance metric was used to solve class imbalance in the labeled, high-dimensional dataset. The framework was built using different classifiers, including random forest, support vector machine, and logistic regression, to verify that the framework is not model-dependent. They achieved comparable recall rates (82.70%-95.39%). Moreover, the impacts of contextual variables on DMU performance were unveiled using model-based feature selection and logistic regression. The framework was tested on a banking dataset and an independent dataset containing the electronics, service, and retail industries.
引用
收藏
页码:100 / 129
页数:30
相关论文
共 50 条
[31]   Using the Machine Learning Approach to Predict Patient Survival from High-Dimensional Survival Data [J].
Zhang, Wenbin ;
Tang, Jian ;
Wang, Nuo .
2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, :1234-1238
[32]   High-Dimensional Multi-trait GWAS By Reverse Prediction of Genotypes Using Machine Learning Methods [J].
Malik, Muhammad Ammar ;
Ludl, Adriaan-Alexander ;
Michoel, Tom .
COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, CIBB 2021, 2022, 13483 :79-93
[33]   Advanced Machine Learning Methods for Learning from Sparse Data in High-Dimensional Spaces: A Perspective on Uses in the Upstream of Development of Novel Energy Technologies [J].
Manzhos, Sergei ;
Ihara, Manabu .
PHYSCHEM, 2022, 2 (02) :72-95
[34]   Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights [J].
Malekloo, Arman ;
Ozer, Ekin ;
AlHamaydeh, Mohammad ;
Girolami, Mark .
STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2022, 21 (04) :1906-1955
[35]   Evolutionary computation-based machine learning for Smart City high-dimensional Big Data Analytics [J].
Li, Xiaoming ;
Zhang, Dan ;
Zheng, Ye ;
Hong, Wuyang ;
Wang, Weixi ;
Xia, Jizhe ;
Lv, Zhihan .
APPLIED SOFT COMPUTING, 2023, 133
[36]   Applications of Big Data and AI-Driven Technologies in High-Dimensional Data Analysis: Taiwanese Bankruptcy Prediction Using Machine Learning Models with Factor Analysis [J].
Ko, Juyong ;
Lee, Jai Woo .
JOURNAL OF THE KOREAN SOCIETY FOR INDUSTRIAL AND APPLIED MATHEMATICS, 2024, 28 (04) :286-302
[37]   High-dimensional aerodynamic data modeling using a machine learning method based on a convolutional neural network [J].
Bo-Wen Zan ;
Zhong-Hua Han ;
Chen-Zhou Xu ;
Ming-Qi Liu ;
Wen-Zheng Wang .
Advances in Aerodynamics, 4
[38]   High-dimensional aerodynamic data modeling using a machine learning method based on a convolutional neural network [J].
Zan, Bo-Wen ;
Han, Zhong-Hua ;
Xu, Chen-Zhou ;
Liu, Ming-Qi ;
Wang, Wen-Zheng .
ADVANCES IN AERODYNAMICS, 2022, 4 (01)
[39]   A Machine Learning Approach for High-Dimensional Time-to-Event Prediction With Application to Immunogenicity of Biotherapies in the ABIRISK Cohort [J].
Duhaze, Julianne ;
Hassler, Signe ;
Bachelet, Delphine ;
Gleizes, Aude ;
Hacein-Bey-Abina, Salima ;
Allez, Matthieu ;
Deisenhammer, Florian ;
Fogdell-Hahn, Anna ;
Mariette, Xavier ;
Pallardy, Marc ;
Broet, Philippe .
FRONTIERS IN IMMUNOLOGY, 2020, 11
[40]   Data driven decisions in education using a comprehensive machine learning framework for student performance prediction [J].
Gul, Muhammad Nadeem ;
Abbasi, Waseem ;
Babar, Muhammad Zeeshan ;
Aljohani, Abeer ;
Arif, Muhammad .
DISCOVER COMPUTING, 2025, 28 (01)