Feature selection algorithm based on mutual information and lasso for microarray data

被引:0
作者
Zhongxin W. [1 ]
Gang S. [1 ,2 ]
Jing Z. [3 ]
Jia Z. [1 ]
机构
[1] School of Computer and Information Engineering, Fuyang Teachers College, Fuyang
[2] School of Computer and Information, Hefei University of Technology, Hefei
[3] Information & Telecommunication Branch, State Grid Anhui Electric Power Company, Hefei
基金
中国国家自然科学基金;
关键词
Feature selection; Lasso; Microarray data; Mutual information;
D O I
10.2174/1874070701610010278
中图分类号
学科分类号
摘要
With the development of microarray technology, massive microarray data is produced by gene expression experiments, and it provides a new approach for the study of human disease. Due to the characteristics of high dimensionality, much noise and data redundancy for microarray data, it is difficult to my knowledge from microarray data profoundly and accurately,and it also brings enormous difficulty for information genes selection. Therefore, a new feature selection algorithm for high dimensional microarray data is proposed in this paper, which mainly involves two steps. In the first step, mutual information method is used to calculate all genes, and according to the mutual information value, information genes is selected as candidate genes subset and irrelevant genes are filtered. In the second step, an improved method based on Lasso is used to select information genes from candidate genes subset, which aims to remove the redundant genes. Experimental results show that the proposed algorithm can select fewer genes, and it has better classification ability, stable performance and strong generalization ability. It is an effective genes feature selection algorithm. © Zhongxin et al.
引用
收藏
页码:278 / 286
页数:8
相关论文
共 14 条
[1]  
Kim Y.S., Street W.N., Menczer F., Data Mining: Opportunities and Challenges, (2003)
[2]  
Saeys Y., Inza I., Larrafiaga P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 19, pp. 2507-2517, (2007)
[3]  
Wang Y.H., Makedon F.S., Ford J.C., Pearlman J., HykGene: A hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data, Bioinformatics, 21, 8, pp. 1530-1537, (2005)
[4]  
Golub T.R., Slonim D.K., Tamayo P., Et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286, 5439, pp. 531-537, (1999)
[5]  
Robnik Sikonja M., Kononenko I., Theoretical and empirical analysis of ReliefF and RreliefF, Mach Learn, 53, 1, pp. 23-69, (2003)
[6]  
Hanczar B., Courtine M., Benis A., Hennegar C., Clement K., Zucker J.D., Improving classification of microarray data using prototype-based feature selection, ACM SIGKDD Explor Newsl, 5, 2, pp. 23-30, (2003)
[7]  
Tan F., Fu X.Z., Wang H., Zhang Y.Q., Bourgeois A., A hybrid feature selection approach for microarray gene expression data, Proceedings of the 6Th International Conference on Computational Science, pp. 678-685, (2006)
[8]  
Wang S.-L., Wang J., Chen H.-W., Et al., Heuristic breadth-first search algorithm for informative gene selection based on gene expression profiles, Chin J Comput, 31, 4, pp. 636-649, (2008)
[9]  
Chuang L.Y., Yang C.H., Li J.C., Yang C.H., A hybrid BPSO-CGA approach for gene selection and classification of microarray data, J Comput Biol, 19, 1, pp. 68-82, (2012)
[10]  
Zhiwen Y., Le L., Jane Y., Et al., SC3: Triple spectral clustering based consensus clustering framework for class discovery from cancer gene expression profiles, IEEE/ACM Trans Comput Biol Bioinf, 9, 6, pp. 1751-1765, (2012)