A greedy correlation measure based attribute clustering algorithm for gene selection

被引:1
作者
Xu, Jiucheng [1 ]
Gao, Yunpeng [1 ]
Li, Shuangqun [1 ]
Sun, Lin [1 ]
Xu, Tianhe [1 ]
Ren, Jinyu [1 ]
机构
[1] College of Computer and Information Technology, Henan Normal University, Xinxiang
来源
Journal of Computers (Finland) | 2013年 / 8卷 / 04期
关键词
Attribute clustering; Correlation; Gene selection; Neighborhood mutual information; Significant multiple correlation;
D O I
10.4304/jcp.8.4.951-959
中图分类号
学科分类号
摘要
This paper proposes an attribute clustering algorithm for grouping attributes into clusters so as to obtain meaningful modes from microarray data. First the problem of attribute clustering is analyzed and neighborhood mutual information is introduced to solve it. Furthermore, an attribute clustering algorithm is presented for grouping attributes into clusters through optimizing a criterion function which is derived from an information measure that reflects the correlation between attributes. Then, by applying this method to gene expression data, meaningful clusters are discovered which assists to capture aspects of gene association patterns. Thus, significant genes containing useful information for gene classification and identification are selected. In the following, the proposed algorithm is employed to six gene expression data sets and a comparison is made with several well-known gene selection methods. Experiments show that the greedy correlation measure based attribute clustering algorithm, noted as GCMACA, is more capable of discovering meaningful clusters of genes. Through selecting a subset of genes which have a high significant multiple correlation value with others within clusters, informative genes can be acquired and gene expression of different categories can be identified as well. © 2013 ACADEMY PUBLISHER.
引用
收藏
页码:951 / 959
页数:8
相关论文
共 32 条
[1]  
Jaeger J., Sengupta R., Ruzzo W.L., Improved Gene Selection for Classification of Microarrays, Pacific Symposium on Biocomputing, 8, pp. 53-64, (2003)
[2]  
Hong Y., Jaideep V., Haibing L., Searching Engine Query Clustering using top-k Search Results, IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 1, pp. 112-119, (2011)
[3]  
Hopfensitz M., Mussel C., Wawra C., Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9, pp. 487-498, (2012)
[4]  
Domay E., Cluster Analysis of Gene Expression Data, J. Statistical Physics, 110, pp. 1117-1139, (2003)
[5]  
Jiang D., Tang C., Zhang A., Cluster Analysis for Gene Expression Data: A survey, IEEE Trans. Knowledge and Data Eng, 16, pp. 1370-1386, (2004)
[6]  
Au W.-H., Chan K.C.C., Wong A.K.C., Wang Y., Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2, pp. 83-101, (2005)
[7]  
Qinghua H., Daren Y., Xie Z., Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation, Journal of Software, 19, pp. 640-649, (2008)
[8]  
Kentzoglanakis K., Poole M., A Swarm Intelligence Framework for Reconstructing Gene Networks: Searching for Biologicalling plausible Architectures, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2, pp. 358-371, (2012)
[9]  
Piatetsky-Shapiro G., Khabaza T., Ramaswamy S., Capturing Best Practice for Microarray Gene Expression Data Analysis, Proc. Ninth ACM SIGKDD Int'l Conf, Knowledge Discovery and Data Mining, pp. 407-415, (2003)
[10]  
Heyer L.J., Kruglyak S., Yooseph S., Exploring Expression Data: Identification and Analysis of Coexpressed Genes, Genome Research, 286, pp. 531-537, (1999)