A kernel-based clustering method for gene selection with gene expression data

被引:49
|
作者
Chen, Huihui [1 ]
Zhang, Yusen [1 ]
Gutman, Ivan [2 ]
机构
[1] Shandong Univ Weihai, Sch Math & Stat, Weihai 264209, Peoples R China
[2] Univ Kragujevac, Fac Sci, POB 60, Kragujevac 34000, Serbia
关键词
Gene expression data; Kernel-based clustering; Adaptive distance; Gene selection; Cancer classification; CANCER CLASSIFICATION; PREDICTION; ALGORITHM; DISCOVERY;
D O I
10.1016/j.jbi.2016.05.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Gene selection is important for cancer classification based on gene expression data, because of high dimensionality and small sample size. In this paper, we present a new gene selection method based on clustering, in which dissimilarity measures are obtained through kernel functions. It searches for best weights of genes iteratively at the same time to optimize the clustering objective function. Adaptive distance is used in the process, which is suitable to learn the weights of genes during the clustering process, improving the performance of the algorithm. The proposed algorithm is simple and does not require any modification or parameter optimization for each dataset. We tested it on eight publicly available datasets, using two classifiers (support vector machine, k-nearest neighbor), compared with other six competitive feature selectors. The results show that the proposed algorithm is capable of achieving better accuracies and may be an efficient tool for finding possible biomarkers from gene expression data. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:12 / 20
页数:9
相关论文
共 50 条
  • [21] Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization
    Elyasigomari, V.
    Mirjafari, M. S.
    Screen, H. R. C.
    Shaheed, M. H.
    APPLIED SOFT COMPUTING, 2015, 35 : 43 - 51
  • [22] An effective hybrid approach of gene selection and classification for microarray data based on clustering and particle swarm optimisation
    Han, Fei
    Yang, Shanxiu
    Guan, Jian
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 13 (02) : 103 - 121
  • [23] A new clustering method of gene expression data based on multivariate Gaussian mixture models
    Liu, Zhe
    Song, Yu-qing
    Xie, Cong-hua
    Tang, Zheng
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (02) : 359 - 368
  • [24] K-Means Clustering with Infinite Feature Selection for Classification Tasks in Gene Expression Data
    Remli, Muhammad Akmal
    Daud, Kauthar Mohd
    Nies, Hui Wen
    Mohamad, Mohd Saberi
    Deris, Safaai
    Omatu, Sigeru
    Kasim, Shahreen
    Sulong, Ghazali
    11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 50 - 57
  • [25] Particle swarm optimization with a modified sigmoid function for gene selection from gene expression data
    Mohamad M.S.
    Omatu S.
    Deris S.
    Yoshioka M.
    Artificial Life and Robotics, 2010, 15 (01) : 21 - 24
  • [26] Projection Based Clustering of Gene Expression Data
    Tasoulis, Sotiris K.
    Plagianakos, Vassilis P.
    Tasoulis, Dimitris K.
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, 2010, 6160 : 228 - +
  • [27] Robust Bayesian Clustering for Replicated Gene Expression Data
    Sun, Jianyong
    Garibaldi, Jonathan M.
    Kenobi, Kim
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (05) : 1504 - 1514
  • [28] A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification
    Almugren, Nada
    Alshamlan, Hala
    IEEE ACCESS, 2019, 7 : 78533 - 78548
  • [29] New Entropy-Based Method for Gene Selection
    Mahmoodian, Hamid
    Marhaban, M. H.
    Rahim, R. Abdul
    Rosli, R.
    Saripan, I.
    IETE JOURNAL OF RESEARCH, 2009, 55 (04) : 162 - 168
  • [30] Attribute clustering for grouping, selection, and classification of gene expression data
    Au, WH
    Chan, KCC
    Wong, AKC
    Wang, Y
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (02) : 83 - 101