共 27 条
A new feature selection using dynamic interaction
被引:5
作者:
Li, Zhang
[1
,2
]
机构:
[1] Jiangsu Univ Technol, Sch Comp Engn, Changzhou 213001, Jiangsu, Peoples R China
[2] Beijing Univ Posts & Telecommun, Minist Educ, Key Lab Trustworthy Distributed Comp & Serv, Beijing 100876, Peoples R China
关键词:
Feature selection;
Feature interaction;
Feature relevance;
Feature redundancy;
Filter method;
MUTUAL INFORMATION;
RELEVANCE;
CLASSIFICATION;
D O I:
10.1007/s10044-020-00916-2
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
With the continuous development of Internet technology, data gradually present a complicated and high-dimensional trend. These high-dimensional data have a large number of redundant features and irrelevant features, which bring great challenges to the existing machine learning algorithms. Feature selection is one of the important research topics in the fields of machine learning, pattern recognition and data mining, and it is also an important means in the data preprocessing stage. Feature selection is to look for the optimal feature subset from the original feature set, which would improve the classification accuracy and reduce the machine learning time. The traditional feature selection algorithm tends to ignore the kind of feature which has a weak distinguishing capacity as a monomer, whereas the feature group's distinguishing capacity is strong. Therefore, a new dynamic interaction feature selection (DIFS) algorithm is proposed in this paper. Initially, under the theoretical framework of interactive information, it redefines the relevance, irrelevance and redundancy of the features. Secondly, it offers the computational formulas for calculating interactive information. Finally, under the eleven data sets of UCI and three different classifiers, namely, KNN, SVM and C4.5, the DIFS algorithm increases the classification accuracy of the FullSet by 3.2848% and averagely decreases the number of features selected by 15.137. Hence, the DIFS algorithm can not only identify the relevance feature effectively, but also identify the irrelevant and redundant features. Moreover, it can effectively improve the classification accuracy of the data sets and reduce the feature dimensions of the data sets.
引用
收藏
页码:203 / 215
页数:13
相关论文