Feature weight estimation for gene selection: a local hyperlinear learning approach

被引:40
作者
Cai, Hongmin [1 ]
Ruan, Peiying [2 ]
Ng, Michael [3 ]
Akutsu, Tatsuya [2 ]
机构
[1] S China Univ Technol, Sch Engn & Comp Sci, Guangzhou, Guangdong, Peoples R China
[2] Kyoto Univ, Inst Chem Res, Kyoto 606, Japan
[3] Hong Kong Baptist Univ, Dept Math, Hong Kong, Hong Kong, Peoples R China
来源
BMC BIOINFORMATICS | 2014年 / 15卷
关键词
Feature weighting; Local hyperplane; Classification; RELIEF; KNN; CANCER CLASSIFICATION; DISCRIMINANT-ANALYSIS;
D O I
10.1186/1471-2105-15-70
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless, it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results: We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion: Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms.
引用
收藏
页数:13
相关论文
共 33 条
  • [1] Atkeson CG, 1997, ARTIF INTELL REV, V11, P11, DOI 10.1023/A:1006559212014
  • [2] Brown G, 2009, LECT NOTES COMPUT SC, V5519, P344, DOI 10.1007/978-3-642-02326-2_35
  • [3] Improving Cancer Classification Accuracy Using Gene Pairs
    Chopra, Pankaj
    Lee, Jinseung
    Kang, Jaewoo
    Lee, Sunwon
    [J]. PLOS ONE, 2010, 5 (12):
  • [4] Optimization Based Tumor Classification from Microarray Gene Expression Data
    Dagliyan, Onur
    Uney-Yuksektepe, Fadime
    Kavakli, I. Halil
    Turkay, Metin
    [J]. PLOS ONE, 2011, 6 (02):
  • [5] Ding Chris, 2005, Journal of Bioinformatics and Computational Biology, V3, P185, DOI 10.1142/S0219720005001004
  • [6] Multiple SVM-RFE for gene selection in cancer classification with expression data
    Duan, KB
    Rajapakse, JC
    Wang, HY
    Azuaje, F
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2005, 4 (03) : 228 - 234
  • [7] Model-based clustering, discriminant analysis, and density estimation
    Fraley, C
    Raftery, AE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) : 611 - 631
  • [8] Geman Donald., 2004, Statistical applications in genetics and molecular biology, V3, P1071
  • [9] Probability density estimation from optimally condensed data samples
    Girolami, M
    He, C
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (10) : 1253 - 1264
  • [10] Gene selection for cancer classification using support vector machines
    Guyon, I
    Weston, J
    Barnhill, S
    Vapnik, V
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 389 - 422