A greedy correlation measure based attribute clustering algorithm for gene selection

被引：1

作者：

Xu, Jiucheng ^{[1
]}

Gao, Yunpeng ^{[1
]}

Li, Shuangqun ^{[1
]}

Sun, Lin ^{[1
]}

Xu, Tianhe ^{[1
]}

Ren, Jinyu ^{[1
]}

机构：

[1] College of Computer and Information Technology, Henan Normal University, Xinxiang

来源：

Journal of Computers (Finland) | 2013年 / 8卷 / 04期

关键词：

Attribute clustering; Correlation; Gene selection; Neighborhood mutual information; Significant multiple correlation;

D O I：

10.4304/jcp.8.4.951-959

中图分类号：

学科分类号：

摘要：

This paper proposes an attribute clustering algorithm for grouping attributes into clusters so as to obtain meaningful modes from microarray data. First the problem of attribute clustering is analyzed and neighborhood mutual information is introduced to solve it. Furthermore, an attribute clustering algorithm is presented for grouping attributes into clusters through optimizing a criterion function which is derived from an information measure that reflects the correlation between attributes. Then, by applying this method to gene expression data, meaningful clusters are discovered which assists to capture aspects of gene association patterns. Thus, significant genes containing useful information for gene classification and identification are selected. In the following, the proposed algorithm is employed to six gene expression data sets and a comparison is made with several well-known gene selection methods. Experiments show that the greedy correlation measure based attribute clustering algorithm, noted as GCMACA, is more capable of discovering meaningful clusters of genes. Through selecting a subset of genes which have a high significant multiple correlation value with others within clusters, informative genes can be acquired and gene expression of different categories can be identified as well. © 2013 ACADEMY PUBLISHER.

引用

页码：951 / 959

页数：8

共 32 条

[1]

Jaeger J., Sengupta R., Ruzzo W.L., Improved Gene Selection for Classification of Microarrays, Pacific Symposium on Biocomputing, 8, pp. 53-64, (2003)

[2]

Hong Y., Jaideep V., Haibing L., Searching Engine Query Clustering using top-k Search Results, IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 1, pp. 112-119, (2011)

[3]

Hopfensitz M., Mussel C., Wawra C., Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9, pp. 487-498, (2012)

[4]

Domay E., Cluster Analysis of Gene Expression Data, J. Statistical Physics, 110, pp. 1117-1139, (2003)

[5]

Jiang D., Tang C., Zhang A., Cluster Analysis for Gene Expression Data: A survey, IEEE Trans. Knowledge and Data Eng, 16, pp. 1370-1386, (2004)

[6]

Au W.-H., Chan K.C.C., Wong A.K.C., Wang Y., Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2, pp. 83-101, (2005)

[7]

Qinghua H., Daren Y., Xie Z., Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation, Journal of Software, 19, pp. 640-649, (2008)

[8]

Kentzoglanakis K., Poole M., A Swarm Intelligence Framework for Reconstructing Gene Networks: Searching for Biologicalling plausible Architectures, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2, pp. 358-371, (2012)

[9]

Piatetsky-Shapiro G., Khabaza T., Ramaswamy S., Capturing Best Practice for Microarray Gene Expression Data Analysis, Proc. Ninth ACM SIGKDD Int'l Conf, Knowledge Discovery and Data Mining, pp. 407-415, (2003)

[10]

Heyer L.J., Kruglyak S., Yooseph S., Exploring Expression Data: Identification and Analysis of Coexpressed Genes, Genome Research, 286, pp. 531-537, (1999)

← 1 2 3 4 →