A novel algorithm applied to classify unbalanced data

被引:29
作者
Lee, Chou-Yuan [1 ]
Lee, Zne-Jung [2 ]
机构
[1] Lan Yang Inst Technol, Dept Informat Management, Taipei, Taiwan
[2] Huafan Univ, Dept Informat Management, Taipei, Taiwan
关键词
Unbalanced data; Fuzzy C-means; Bacterial foraging optimization; Analysis of variance; Hybrid algorithm; FEATURE-SELECTION; PARAMETER DETERMINATION; MACHINE; CLASSIFICATION; OPTIMIZATION; LOAD;
D O I
10.1016/j.asoc.2012.03.051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unbalanced data that are minority classes with few samples presented in many fields. The mean of unbalanced data is difficult to formalize so that traditional algorithms are limited in solving unbalanced data. In this paper, a novel algorithm based on analysis of variance (ANOVA), fuzzy C-means (FCM) and bacterial foraging optimization (BFO) is proposed to classify unbalanced data. ANOVA can measure the difference between the means of two or more groups in which the observed variance is partitioned into components due to various explanatory variables. FCM is a method of fuzzy clustering algorithm that allows one piece of data to belong to two or more clusters. Natural selection tends to eliminate animals with poor foraging strategies and favors the propagation of genes of those animals that have successful foraging strategies. BFO can model the mechanism of natural selection and solve many application problems. The proposed algorithm combines the advantages of ANOVA, FCM and BFO. ANOVA has the ability to select beneficial feature subsets. FCM has the ability to identify data into clusters with certain membership degrees, and BFO has the fast ability to converge to global optima. In this paper, microarray data of ovarian cancer and zoo dataset are used to test the performance for the proposed algorithm. The performance of the proposed algorithm is supported by simulation results. From simulation results, the classification accuracy of the proposed algorithm outperforms other existing approaches. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:2481 / 2485
页数:5
相关论文
共 48 条
  • [1] ANFIS-based approach for the estimation of transverse mixing coefficient
    Ahmad, Z.
    Azamathulla, H. Md.
    Zakaria, N. A.
    [J]. WATER SCIENCE AND TECHNOLOGY, 2011, 63 (05) : 1004 - 1009
  • [2] [Anonymous], 2003, P ICML 2003 WORKSH L
  • [3] [Anonymous], POW SYST TECHN IEEE
  • [4] [Anonymous], 2000, P AAAI 2000 WORKSH L
  • [5] [Anonymous], Pattern Recognition with Fuzzy Objective Function Algorithms
  • [6] [Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
  • [7] Genetic Programming to Predict River Pipeline Scour
    Azamathulla, H. Md.
    Ab Ghani, Aminuddin
    [J]. JOURNAL OF PIPELINE SYSTEMS ENGINEERING AND PRACTICE, 2010, 1 (03) : 127 - 132
  • [8] ANFIS-Based Approach for Predicting the Scour Depth at Culvert Outlets
    Azamathulla, H. MD.
    Ab Ghani, Aminuddin
    [J]. JOURNAL OF PIPELINE SYSTEMS ENGINEERING AND PRACTICE, 2011, 2 (01) : 35 - 40
  • [9] Comparison between genetic algorithm and linear programming approach for real time operation
    Azamathulla, H. Md.
    Wu, Fu-Chun
    Ab Ghani, Aminuddin
    Narulkar, Sandeep M.
    Zakaria, Nor Azazi
    Chang, Chun Kiat
    [J]. JOURNAL OF HYDRO-ENVIRONMENT RESEARCH, 2008, 2 (03) : 172 - 181
  • [10] Genetic Programming to Predict Bridge Pier Scour
    Azamathulla, H. Md.
    Ab Ghani, Aminuddin
    Zakaria, Nor Azazi
    Guven, Aytac
    [J]. JOURNAL OF HYDRAULIC ENGINEERING, 2010, 136 (03) : 165 - 169