Feature selection using correlation fractal dimension: Issues and applications in binary classification problems

被引:19
作者
Bhavani, S. Durga [1 ]
Rani, T. Sobha [1 ]
Bapi, Raju S. [1 ]
机构
[1] Univ Hyderabad, Computat Intelligence Lab, Dept Comp & Informat Sci, Hyderabad 500046, Andhra Pradesh, India
关键词
supervised learning; feature subset selection; fractal dimension; machine learning; neural networks;
D O I
10.1016/j.asoc.2007.03.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection methods can be classified broadly into filter and wrapper approaches. Filter-based methods filter out features which are irrelevant to the target concept by ranking each feature according to some discrimination measure and then select features with high ranking value. In this paper, a filter-based feature selection method based on correlation fractal dimension (CFD) discrimination measure is proposed. One of the subgoals of this paper is to outline some issues that arise while calculating fractal dimension for higher dimensional 'empirical' data sets. It is well known that the calculation of fractal dimension for empirical data sets is meaningful only for an appropriate range of scales. We demonstrate through experimentation on data sets of various sizes that fractal dimension-based algorithms cannot be applied routinely to higher dimensional data sets as the calculation of fractal dimension is inherently sensitive to parameters like range of scales and the size of the data sets. Based on the empirical analysis, we propose a new feature selection technique using CFD that avoids the above issues. We successfully applied the proposed algorithm on a challenging classification problem in bioinformatics, namely, Promoter Recognition. (C) 2007 Elsevier B. V. All rights reserved.
引用
收藏
页码:555 / 563
页数:9
相关论文
共 25 条
  • [1] [Anonymous], 2004, P INT COMPR ENG C W
  • [2] [Anonymous], 1997, Machine Learning
  • [3] Barnsley MF., 1988, Fractals Everywhere
  • [4] EXPECTATION MAXIMIZATION ALGORITHM FOR IDENTIFYING PROTEIN-BINDING SITES WITH VARIABLE LENGTHS FROM UNALIGNED DNA FRAGMENTS
    CARDON, LR
    STORMO, GD
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1992, 223 (01) : 159 - 170
  • [5] Dash M., 1997, Intelligent Data Analysis, V1
  • [6] DHOTE AK, 2004, FRACTALS TOOL PROMOT
  • [7] Falconer K., 2004, FRACTAL GEOMETRY MAT
  • [8] Sequence alignment kernel for recognition of promoter regions
    Gordon, L
    Chervonenkis, AY
    Gammerman, AJ
    Shahmuradov, IA
    Solovyev, VV
    [J]. BIOINFORMATICS, 2003, 19 (15) : 1964 - 1971
  • [9] GENERALIZED DIMENSIONS OF STRANGE ATTRACTORS
    GRASSBERGER, P
    [J]. PHYSICS LETTERS A, 1983, 97 (06) : 227 - 230
  • [10] CHARACTERIZATION OF STRANGE ATTRACTORS
    GRASSBERGER, P
    PROCACCIA, I
    [J]. PHYSICAL REVIEW LETTERS, 1983, 50 (05) : 346 - 349