A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

被引:11
作者
Nishanth, Kancherla Jonah [2 ]
Ravi, Vadlamani [1 ]
机构
[1] IDRBT, Ctr Excellence CRM & Analyt, Hyderabad, Andhra Pradesh, India
[2] Univ Hyderabad, Informat Technol, Hyderabad, Andhra Pradesh, India
来源
JOURNAL OF INFORMATION PROCESSING SYSTEMS | 2013年 / 9卷 / 04期
关键词
Data Imputation; General Regression Neural Network (GRNN); Evolving Clustering Method (ECM); Imputation; K-Medoids clustering; K-Means clustering; MLP;
D O I
10.3745/JIPS.2013.9.4.633
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and they also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real-time. The paper proposes a computational intelligence based architecture for online data imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation technique has 2 stages. In stage 1, Evolving Clustering Method (ECM) is used to replace the missing values with cluster centers, as part of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique. The offline imputation techniques employ K-Means or K-Medoids and Multi Layer Perceptron (MLP) or GRNN in Stage-1 and Stage-2 respectively. Several experiments were conducted on 8 benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imputation techniques. In terms of Mean Absolute Percentage Error (MAPE), the results indicate that the difference between the proposed best offline imputation method viz., K-Medoids+GRNN and the proposed online imputation method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, being less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation techniques. This is the significant outcome of the study. Furthermore, GRNN in stage-2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets.
引用
收藏
页码:633 / 650
页数:18
相关论文
共 41 条
[1]  
Abdella M, 2005, ICCC 2005: IEEE 3rd International Conference on Computational Cybernetics, P207
[2]  
Ankaiah N., 2011, DMIN
[3]   Bayesian modeling of missing data in clinical research [J].
Austin, PC ;
Escobar, MD .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 49 (03) :821-836
[4]  
Batista G., 2003, TECHNICAL REPORT
[5]  
Batista G.E., 2002, HIS, V87, P48
[6]   A selective Bayes Classifier for classifying incomplete data based on gain ratio [J].
Chen, Jingnian ;
Huang, Houkuan ;
Tian, Fengzhan ;
Tian, Shengfeng .
KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) :530-534
[7]  
Cooke M., 1994, ICSLP 94. 1994 International Conference on Spoken Language Processing, P1555
[8]  
DESARBO WS, 1986, DECISION SCI, V17, P163, DOI 10.1111/j.1540-5915.1986.tb00219.x
[9]   Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario [J].
Di Nuovo, Alessandro G. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (06) :6793-6797
[10]   Estimation of missing streamflow data using principles of chaos theory [J].
Elshorbagy, A ;
Simonovic, SP ;
Panu, US .
JOURNAL OF HYDROLOGY, 2002, 255 (1-4) :123-133