Towards an Unsupervised Feature Selection Method for Effective Dynamic Features

被引:8
作者
Almusallam, Naif [1 ]
Tari, Zahir [2 ]
Chan, Jeffrey [2 ]
Fahad, Adil [3 ]
Alabdulatif, Abdulatif [4 ]
Al-Naeem, Mohammed [5 ]
机构
[1] Imam Mohammad Ibn Saud Islamic Univ IMSIU, Sch Comp Sci, Riyadh 13318, Saudi Arabia
[2] RMIT Univ, Sch Comp Sci & Informat Technol, Melbourne, Vic 3001, Australia
[3] Albaha Univ, Sch Comp Sci & Informat Technol CS&IT, Al Bahah 65731, Saudi Arabia
[4] Qassim Univ, Comp Sci Dept, Buraydah 51452, Saudi Arabia
[5] King Faisal Univ, Coll Comp Sci & Informat Technol, Dept Comp Networks & Commun, Al Hasa 31982, Saudi Arabia
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Feature selection; streaming features; unsupervised learning;
D O I
10.1109/ACCESS.2021.3082755
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic features applications present new obstacles for the selection of streaming features. The dynamic features applications have various characteristics: a) features are processed sequentially while the number of instances is fixed; and b) the feature space does not exist in advance. For example, in a text classification task for spam detection, new features (e.g. words) are dynamically generated and therefore need to be mined to filter out the spams rather than waiting for all features to be collected in order to do so. Traditional feature selection methods, which are not designed for streaming features applications, cannot be used in such an environment, as they require the full feature space in advance in order to statistically determine the representative features. Existing methods that address feature selection in dynamic features applications require the class labels in order to select the representative features. However, most of the real-life data is unlabeled and it is costly to apply manual labeling. In this paper, an efficient unsupervised features selection method is proposed for streaming features applications where the number of features increases while the number of instances remains fixed. In particular, unsupervised Feature Selection for Dynamic Features (UFSSF) is developed to determine the representative streaming features without requiring prior knowledge about data class labels or representative features. The UFSSF extends the k-mean clustering to cumulatively determine whether the newly-arrived feature can be selected as a representative streaming feature, or discarded. Experimental results show significant accuracy results and efficient execution time compared to those of other benchmark methods.
引用
收藏
页码:77149 / 77163
页数:15
相关论文
共 29 条
  • [1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
  • [2] UFSSF - An Efficient Unsupervised Feature Selection for Streaming Features
    Almusallam, Naif
    Tari, Zahir
    Chan, Jeffrey
    AlHarthi, Adil
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 493 - 505
  • [3] [Anonymous], 2007, ENCY MEASUREMENT STA
  • [4] Recent advances and emerging challenges of feature selection in the context of big data
    Bolon-Canedo, V.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 86 : 33 - 45
  • [5] A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis
    Chen, Hui-Ling
    Yang, Bo
    Liu, Jie
    Liu, Da-You
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) : 9014 - 9022
  • [6] Feature selection with missing data using mutual information estimators
    Doquire, Gauthier
    Verleysen, Michel
    [J]. NEUROCOMPUTING, 2012, 90 : 3 - 11
  • [7] An evaluation of classifier-specific filter measure performance for feature selection
    Freeman, Cecille
    Kulic, Dana
    Basir, Otman
    [J]. PATTERN RECOGNITION, 2015, 48 (05) : 1812 - 1826
  • [8] Han J, 2012, MOR KAUF D, P1
  • [9] Consensus unsupervised feature ranking from multiple views
    Hong, Yi
    Kwong, Sam
    Chang, Yuchou
    Ren, Qingsheng
    [J]. PATTERN RECOGNITION LETTERS, 2008, 29 (05) : 595 - 602
  • [10] The ANNIGMA-wrapper approach to fast feature selection for neural nets
    Hsu, CN
    Huang, HJ
    Schuschel, D
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2002, 32 (02): : 207 - 212