An Integrated machine learning and DEA-predefined performance outcome prediction framework with high-dimensional imbalanced data

被引：3

作者：

Shi, Yu ^{[1
,3
]}

Zhao, Wei ^{[2
]}

机构：

[1] Drake Univ, Coll Business & Publ Adm, Des Moines, IA USA

[2] Worcester Polytech Inst, Dept Biomed Engn, Worcester, MA USA

[3] Drake Univ, Coll Business & Publ Adm, Des Moines, IA 50311 USA

来源：

INFOR | 2024年 / 62卷 / 01期

关键词：

Data envelopment analysis; machine learning; feature selection; performance evaluation; contextual variables; DATA ENVELOPMENT ANALYSIS; BANK BRANCH EFFICIENCY; CREDIT-RISK; BANKRUPTCY PREDICTION; OPERATING EFFICIENCY; FINANCIAL RATIOS; SMOTE; OUTLIERS; MODEL;

D O I：

10.1080/03155986.2023.2168943

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In performance evaluation, emerging studies utilize machine learning to increase the interpretability and robustness of data envelopment analysis (DEA), a non-parametric tool for assessing the relative performance of decision-making units (DMUs). In these studies, the machine learning dynamics typically do not replicate the DEA process in terms of directly labeling DMUs based on their relative performance. Practically, there is no standardized methodological framework that serves this purpose. We propose a data-driven and computationally efficient system that imitates DEA and predicts performance outcomes, which are grouped into several classes. First, a DEA composite index was constructed, and the subsequent DEA scores were labeled as the good, the acceptable, and the underperforming classes. Next, synthetic minority oversampling technique (SMOTE) with Manhattan distance metric was used to solve class imbalance in the labeled, high-dimensional dataset. The framework was built using different classifiers, including random forest, support vector machine, and logistic regression, to verify that the framework is not model-dependent. They achieved comparable recall rates (82.70%-95.39%). Moreover, the impacts of contextual variables on DMU performance were unveiled using model-based feature selection and logistic regression. The framework was tested on a banking dataset and an independent dataset containing the electronics, service, and retail industries.

引用

页码：100 / 129

页数：30

共 50 条

[11] Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study
Pes, Barbara
Lai, Giuseppina
PEERJ COMPUTER SCIENCE, 2021, 7
[12] Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study
Pes B.
Lai G.
Pes, Barbara (pes@unica.it), 1600, PeerJ Inc. (07):
[13] Customer purchase prediction from the perspective of imbalanced data: A machine learning framework based on factorization machine
Chen, Shui-xia
Wang, Xiao-kang
Zhang, Hong-yu
Wang, Jian-qiang
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 173
[14] An Efficient Machine Learning Framework for Stress Prediction via Sensor Integrated Keyboard Data
Pankajavalli, P. B.
Karthick, G. S.
Sakthivel, R.
IEEE ACCESS, 2021, 9 : 95023 - 95035
[15] Novel machine learning approach for classification of high-dimensional microarray data
Musheer, Rabia Aziz
Verma, C. K.
Srivastava, Namita
SOFT COMPUTING, 2019, 23 (24) : 13409 - 13421
[16] HIBoost: A hubness-aware ensemble learning algorithm for high-dimensional imbalanced data classification
Wu, Qin
Lin, Yaping
Zhu, Tuanfei
Zhang, Yue
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (01) : 133 - 144
[17] Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches
Wang, Chamont
Gevertz, Jana L.
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2016, 15 (04) : 321 - 347
[18] Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
Gao, Ruize
Cui, Shaoze
Wang, Yu
Xu, Wei
FINANCIAL INNOVATION, 2025, 11 (01)
[19] B2FSE framework for high dimensional imbalanced data: A case study for drug toxicity prediction
Hooda, Nishtha
Bawa, Seema
Rana, Prashant Singh
NEUROCOMPUTING, 2018, 276 : 31 - 41
[20] A Sparse Learning Machine for High-Dimensional Data with Application to Microarray Gene Analysis
Cheng, Qiang
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (04) : 636 - 646

← 1 2 3 4 5 →