Feature selection based on the measurement of correlation information entropy

被引:0
|
作者
Dong H. [1 ]
Teng X. [1 ]
Yang X. [1 ]
机构
[1] College of Computer Science and Technology, Harbin Engineering University, Harbin
基金
中国国家自然科学基金;
关键词
Correlation information entropy; Correlation matrix; Feature selection; Group effect; Multivariable correlation;
D O I
10.7544/issn1000-1239.2016.20160172
中图分类号
学科分类号
摘要
Feature selection aims to select a smaller feature subset from the original feature set. The subset can provide the approximate or better performance in data mining and machine learning. Without transforming physical characteristics of features, fewer features give a more powerful interpretation. Traditional information-theoretic methods tend to measure features relevance and redundancy separately and ignore the combination effect of the whole feature subset. In this paper, the correlation information entropy is applied to feature selection, which is a technology in data fusion. Based on this method, we measure the degree of the independence and redundancy among features. Then the correlation matrix is constructed by utilizing the mutual information between features and their class labels and the combination of feature pairs. Besides, with the consideration of the multivariable correlation of different features in subset, the eigenvalue of the correlation matrix is calculated. Therefore, the sorting algorithm of features and an adaptive feature subset selection algorithm combining with the parameter are proposed. Experiment results show the effectiveness and efficiency on classification tasks of the proposed algorithms. © 2016, Science Press. All right reserved.
引用
收藏
页码:1684 / 1695
页数:11
相关论文
共 22 条
  • [1] Bennasar M., Hicks Y., Setchi R., Feature selection using joint mutual information maximisation, Expert Systems with Applications, 42, 22, pp. 8520-8532, (2015)
  • [2] Zhao Z., Morstatter F., Sharma S., Et al., Advancing Feature Selection Research-ASU Feature Selection Repository, (2010)
  • [3] Tang J., Liu H., Unsupervised feature selection for linked social media data, Proc of the 18th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, pp. 904-912, (2012)
  • [4] Eesa A.S., Orman Z., Brifcani A.M.A., A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems, Expert Systems with Applications, 42, 5, pp. 2670-2679, (2015)
  • [5] Saeys Y., Inza I., Larranaga P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 19, pp. 2507-2517, (2007)
  • [6] Singh D., Gnana A.A., Balamurugan S., Et al., A novel feature selection method for image classification, Optoelectronic and Advanced Materials-Rapid Communications, 9, 11-12, pp. 1362-1368, (2015)
  • [7] Meng J., Lin H., Yu Y., A two-stage feature selection method for text categorization, Computers & Mathematics with Applications, 62, 7, pp. 2793-2800, (2011)
  • [8] Battiti R., Using mutual information for selecting features in supervised neural net learning, IEEE Trans on Neural Networks, 5, 4, pp. 537-550, (1994)
  • [9] Yu L., Liu H., Efficient feature selection via analysis of relevance and redundancy, The Journal of Machine Learning Research, 5, pp. 1205-1224, (2004)
  • [10] Peng H., Long F., Ding C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans on Pattern Analysis and Machine Intelligence, 27, 8, pp. 1226-1238, (2005)