Feature Selection for Genomic Signal Processing: Unsupervised, Supervised, and Self-Supervised Scenarios

被引:0
作者
S. Y. Kung
Yuhui Luo
Man-Wai Mak
机构
[1] Princeton University,Department of Electronic and Information Engineering
[2] National Chung Hsing University,undefined
[3] The Hong Kong Polytechnic University,undefined
来源
Journal of Signal Processing Systems | 2010年 / 61卷
关键词
Feature selection; Genomics; Unsupervised; Supervised; Self-supervised; Microarray; Sequence; Filter; Wrapper;
D O I
暂无
中图分类号
学科分类号
摘要
An effective data mining system lies in the representation of pattern vectors. For many bioinformatic applications, data are represented as vectors of extremely high dimension. This motivates the research on feature selection. In the literature, there are plenty of reports on feature selection methods. In terms of training data types, they are divided into the unsupervised and supervised categories. In terms of selection methods, they fall into filter and wrapper categories. This paper will provide a brief overview on the state-of-the-arts feature selection methods on all these categories. Sample applications of these methods for genomic signal processing will be highlighted. This paper also describes a notion of self-supervision. A special method called vector index adaptive SVM (VIA-SVM) is described for selecting features under the self-supervision scenario. Furthermore, the paper makes use of a more powerful symmetric doubly supervised formulation, for which VIA-SVM is particularly useful. Based on several subcellular localization experiments, and microarray time course experiments, the VIA-SVM algorithm when combined with some filter-type metrics appears to deliver a substantial dimension reduction (one-order of magnitude) with only little degradation on accuracy.
引用
收藏
页码:3 / 20
页数:17
相关论文
共 132 条
  • [1] Reinhardt A.(1998)Using neural networks for prediction of the subcellular location of proteins Nucleic Acids Research 26 2230-2236
  • [2] Hubbard T.(2004)Mismatch string kernels for discriminative protein classification Bioinformatics 20 467-476
  • [3] Leslie C. S.(2008)Feature selection for self-supervised classification with applications to microarray and sequence data IEEE Journal of Selected Topics in Signal Processing: Special Issue on Genomic and Proteomic Signal Processing 2 297-309
  • [4] Eskin E.(2003)Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification NanoBioscience, IEEE Transactions on 2 221-232
  • [5] Cohen A.(1997)Wrappers for feature selection Artificial Intelligence 97 273-324
  • [6] Weston J.(1999)Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring Science 286 531-537
  • [7] Noble W. S.(2000)Comparison of algorithms that select features for pattern classifiers Pattern Recognition 33 25-41
  • [8] Kung S. Y.(2003)Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data British Journal of Cancer 89 1599-1604
  • [9] Mak M. W.(2000)’Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns Genome Biology 1 research0003.1-research0003.21
  • [10] Huang C.(2003)Unsupervised feature selection via two-way ordering in gene expression analysis Bioinformatics 19 1259-1266