A financial statement fraud model based on synthesized attribute selection and a dataset with missing values and imbalanced classes

被引:22
作者
Cheng, Ching-Hsue [1 ]
Kao, Yung-Fu [1 ]
Lin, Hsien-Ping [2 ]
机构
[1] Natl Yunlin Univ Sci & Technol, Dept Informat Management, 123,Sect 3,Univ Rd, Touliu 640, Yunlin, Taiwan
[2] Natl Yunlin Univ Sci & Technol, Dept Finance, 123,Sect 3,Univ Rd, Touliu 640, Yunlin, Taiwan
关键词
Financial fraud; Feature selection; Rule-based method; Oversampling; Undersampling; DISCRIMINANT-ANALYSIS; EARNINGS MANAGEMENT; INFORMATION; PREDICTION; ENSEMBLE; TREES; SMOTE;
D O I
10.1016/j.asoc.2021.107487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many financial fraud events have occurred over the past few decades. These events have led to massive losses for investors. Hence, government officials have begun to focus on the problem and have issued several decrees (acts) on financial fraud. Many scholars have explored the factors of financial fraud, and although their results performed well, most studies have not generated a set of useful rules to support auditors. Furthermore, data on financial statement fraud usually constitute an imbalanced class problem, and previous work minimally addresses this problem. Therefore, this study, based on the handling of missing values and imbalanced classes, builds a detecting model of financial statement fraud. First, it utilizes listwise and pairwise deletion to remove missing values. Second, it proposes three merged attribute selection methods and applies a nonlinear distance correlation to select important attributes. Third, it applies undersampling and oversampling to address the imbalanced classes. Finally, it uses rule-based classifiers to generate a set of useful rules. In practice, this study employs a list of fraudulent companies to collect data on financial statement fraud. We summarize the results as follows: (1) the pairwise deletion removes fewer records than does listwise removal in handling missing values; (2) the merged attribute selection (Com_I4) has the best performance on the four evaluation criteria; (3) the oversampling can enhance accuracy, and has the lowest type 1 and type 2 errors; (4) the random forest of Com_I4 can build the optimal model of financial statement fraud in the pairwise deletion and random oversampling; and (5) the results show that the ensemble learning (random forest) is a robust model in this study. Finally, these results in this study can be provided to practitioners, investors, and auditing personnel as references. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:19
相关论文
共 97 条
[1]  
Abbasi A, 2012, MIS QUART, V36, P1293
[2]  
Agarwal G.K., 2014, Journal of Accounting Research Audit Practices, V13, P7
[3]  
AICPA AICPA (American Institute of Certified Public Accountants), 1997, 82 AICPA
[4]   Polynomial Kernel Discriminant Analysis for 2D visualization of classification problems [J].
Alawadi, Sadi ;
Fernandez-Delgado, Manuel ;
Mera, David ;
Barro, Senen .
NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08) :3515-3531
[5]   FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND PREDICTION OF CORPORATE BANKRUPTCY [J].
ALTMAN, EI .
JOURNAL OF FINANCE, 1968, 23 (04) :589-609
[6]  
[Anonymous], 2015, ICIC ELB
[7]  
[Anonymous], 1997, Analysis of incomplete multivariate data
[8]  
[Anonymous], 1984, Classifcation and Regression Trees
[9]   Overvaluation and the Choice of Alternative Earnings Management Mechanisms [J].
Badertscher, Brad A. .
ACCOUNTING REVIEW, 2011, 86 (05) :1491-1518
[10]   False Financial Statements: Characteristics of China's listed companies and CART detecting approach [J].
Bai, Belinna ;
Yen, Jerome ;
Yang, Xiaoguang .
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2008, 7 (02) :339-359