UFSSF - An Efficient Unsupervised Feature Selection for Streaming Features

被引:5
作者
Almusallam, Naif [1 ,2 ]
Tari, Zahir [1 ]
Chan, Jeffrey [1 ]
AlHarthi, Adil [3 ]
机构
[1] Royal Melbourne Inst RMIT, Melbourne, Vic, Australia
[2] Al Imam Muhammad Bin Saud Islamic Univ IMSIU, Riyadh, Saudi Arabia
[3] Albaha Univ, Albaha, Saudi Arabia
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II | 2018年 / 10938卷
关键词
D O I
10.1007/978-3-319-93037-4_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Streaming features applications pose challenges for feature selection. For such dynamic features applications: (a) features are sequentially generated and are processed one by one upon their arrival while the number of instances/ points remains fixed; and (b) the complete feature space is not known in advance. Existing approaches require class labels as a guide to select the representative features. However, in real-world applications most data are not labeled and, moreover, manual labeling is costly. A new algorithm, called Unsupervised Feature Selection for Streaming Features (UFSSF), is proposed in this paper to select representative features in streaming features applications without the need to know the features or class labels in advance. UFSSF extends the k-mean clustering algorithm to include linearly dependent similarity measures so as to incrementally decide whether to add the newly arrived feature to the existing set of representative features. Those features that are not representative are discarded. Experimental results indicates that UFSSF significantly has a better prediction accuracy and running time compared to the baseline approaches.
引用
收藏
页码:493 / 505
页数:13
相关论文
共 15 条
  • [1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
  • [2] [Anonymous], 2016, ARXIV160107996
  • [3] [Anonymous], 2007, ENCY MEASUREMENT STA
  • [4] [Anonymous], 1993, MORGAN KAUFMANN SERI
  • [5] [Anonymous], 2015, P 24 ACM INT C INF K
  • [6] [Anonymous], DESIGNING UNSUPERVIS
  • [7] A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis
    Chen, Hui-Ling
    Yang, Bo
    Liu, Jie
    Liu, Da-You
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) : 9014 - 9022
  • [8] John G. H., 1995, Uncertainty in Artificial Intelligence. Proceedings of the Eleventh Conference (1995), P338
  • [9] An efficient k-means clustering algorithm:: Analysis and implementation
    Kanungo, T
    Mount, DM
    Netanyahu, NS
    Piatko, CD
    Silverman, R
    Wu, AY
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) : 881 - 892
  • [10] Unsupervised feature selection using feature similarity
    Mitra, P
    Murthy, CA
    Pal, SK
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (03) : 301 - 312