Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes

被引:12
作者
Saha, Sriparna [1 ]
Alok, Abhay Kumar [1 ]
Ekbal, Asif [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci Engn, Patna 800015, Bihar, India
关键词
Feature selection; multiobjective optimization (MOO); semisupervised classification; symmetry-based distance; EXPRESSION; ALGORITHM; ENSEMBLE;
D O I
10.1109/JBHI.2015.2451735
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Studying the patterns hidden in gene-expression data helps to understand the functionality of genes. In general, clustering techniques are widely used for the identification of natural partitionings from the gene expression data. In order to put constraints on dimensionality, feature selection is the key issue because not all features are important from clustering point of view. Moreover some limited amount of supervised information can help to fine tune the obtained clustering solution. In this paper, the problem of simultaneous feature selection and semisupervised clustering is formulated as a multiobjective optimization (MOO) task. A modern simulated annealing-based MOO technique namely AMOSA is utilized as the background optimization methodology. Here, features and cluster centers are represented in the form of a string and the assignment of genes to different clusters is done using a point symmetry-based distance. Six optimization criteria based on several internal and external cluster validity indices are utilized. In order to generate the supervised information, a popular clustering technique, Fuzzy C-mean, is utilized. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. The effectiveness of this proposed semisupervised clustering technique, Semi-FeaClustMOO, is demonstrated on five publicly available benchmark gene-expression datasets. Comparison results with the existing techniques for gene-expression data clustering again reveal the superiority of the proposed technique. Statistical and biological significance tests have also been carried out.
引用
收藏
页码:1171 / 1177
页数:7
相关论文
共 31 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
[Anonymous], INT J COMPUT INTELL
[3]  
[Anonymous], Pattern Recognition with Fuzzy Objective Function Algorithms, DOI 10.1007/978-1-4757-0450-1_3
[4]  
[Anonymous], 1974, Pattern recognition principles
[5]   A point symmetry-based clustering technique for automatic evolution of clusters [J].
Bandyopadhyay, Sanghamitra ;
Saha, Sriparna .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) :1441-1457
[6]   A simulated annealing-based multiobjective optimization algorithm: AMOSA [J].
Bandyopadhyay, Sanghamitra ;
Saha, Sriparna ;
Maulik, Ujjwal ;
Deb, Kalyanmoy .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2008, 12 (03) :269-283
[7]   GAPS: A clustering method using a new point symmetry-based distance measure [J].
Bandyopadhyay, Sanghamitra ;
Saha, Sriparna .
PATTERN RECOGNITION, 2007, 40 (12) :3430-3451
[8]   An improved algorithm for clustering gene expression data [J].
Bandyopadhyay, Sanghamitra ;
Mukhopadhyay, Anirban ;
Maulik, Ujjwal .
BIOINFORMATICS, 2007, 23 (21) :2859-2865
[9]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[10]   A genome-wide transcriptional analysis of the mitotic cell cycle [J].
Cho, RJ ;
Campbell, MJ ;
Winzeler, EA ;
Steinmetz, L ;
Conway, A ;
Wodicka, L ;
Wolfsberg, TG ;
Gabrielian, AE ;
Landsman, D ;
Lockhart, DJ ;
Davis, RW .
MOLECULAR CELL, 1998, 2 (01) :65-73