ON GENE SELECTION AND CLASSIFICATION FOR CANCER MICROARRAY DATA USING MULTI-STEP CLUSTERING AND SPARSE REPRESENTATION

被引:1
|
作者
Jing, Liping [1 ]
Ng, Michael K. [2 ]
Zeng, Tieyong [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Hong Kong Baptist Univ, Dept Math, Kowloon Toog, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Gene selection; cancer prediction; Lasso; clustering; classification;
D O I
10.1142/S1793536911000763
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Microarray data profiles gene expression on a whole genome scale, and provides a good way to study associations between gene expression and occurrence or progression of cancer disease. Many researchers realized that microarray data is useful to predict cancer cases. However, the high dimension of gene expressions, which is significantly larger than the sample size, makes this task very difficult. It is very important to identify the significant genes causing cancer. Many feature selection algorithms have been proposed focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for significant genes selection and efficient cancer case classification. The proposed framework first performs a clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects (1) the significant (2) genes in each group using the Bayesian Lasso method and important gene groups using the group Lasso method, and finally builds a prediction model based on the shrinkage gene space with efficient classification algorithm (such as support vector machine (SVM), 1NN, and regression). Experimental results on public available microarray data show that the proposed framework often outperforms the existing feature selection and prediction methods such as SAM, information gain (IG), and Lasso-type prediction models.
引用
收藏
页码:127 / 148
页数:22
相关论文
共 50 条
  • [1] Cancer Classification by Sparse Representation using Microarray Gene Expression Data
    Hang, Xiyi
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, PROCEEDINGS, 2008, : 174 - 177
  • [2] Gene selection for cancer classification in microarray data
    Zhang, Lijuan
    Li, Zhoujun
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2009, 46 (05): : 794 - 802
  • [3] Gene Selection for Cancer Classification from Microarray Data Using Data Overlap Measure
    Sarbazi-Azad, Saeed
    Abadeh, Mohammad Saniee
    2018 25TH IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING AND 2018 3RD INTERNATIONAL IRANIAN CONFERENCE ON BIOMEDICAL ENGINEERING (ICBME), 2018, : 257 - 262
  • [4] Gene subset selection in microarray data using entropic filtering for cancer classification
    Navarro, Felix F. Gonzalez
    Munoz, Lluis A. Belanche
    EXPERT SYSTEMS, 2009, 26 (01) : 113 - 124
  • [5] A Comparative Study of Gene Selection Methods for Cancer Classification Using Microarray Data
    Babu, Manish
    Sarkar, Kamal
    2016 SECOND IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2016, : 204 - 211
  • [6] Spatial clustering based gene selection for gene expression analysis in microarray data classification
    Dhas, P. Edwin
    Lalitha, S.
    Govindaraj, Annalakshmi
    Jyoshna, B.
    AUTOMATIKA, 2024, 65 (01) : 152 - 158
  • [7] Gene selection in microarray data analysis for brain cancer classification
    Leung, Y. Y.
    Chang, C. Q.
    Hung, Y. S.
    Fung, P. C. W.
    2006 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2006, : 99 - +
  • [8] A Combined Clustering and Ranking based Gene Selection Algorithm for Microarray Data Classification
    Rani, M. Jansi
    Devaraj, D.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 183 - 187
  • [9] Gene selection for microarray data classification via dual latent representation learning
    Zheng, Xiao
    Zhang, Chujie
    NEUROCOMPUTING, 2021, 461 : 266 - 280
  • [10] Sparse Representation for Classification of Tumors Using Gene Expression Data
    Hang, Xiyi
    Wu, Fang-Xiang
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2009,