A sub-concept-based feature selection method for one-class classification

被引:2
作者
Liu, Zhen [1 ,2 ,5 ]
Japkowicz, Nathalie [2 ]
Wang, Ruoyu [3 ,4 ]
Liu, Li [2 ,6 ]
机构
[1] Guangdong Pharmaceut Univ, Sch Med Informat Engn, Guangzhou 510006, Peoples R China
[2] Amer Univ, Dept Comp Sci, Washington, DC 20016 USA
[3] South China Univ Technol, Informat & Network Engn & Res Ctr, Guangzhou 510041, Peoples R China
[4] Commun & Comp Network Lab Guangdong, Guangzhou 510041, Peoples R China
[5] Guangdong Prov Precise Med & Big Data Engn Techno, Guangzhou 510006, Peoples R China
[6] Huizhou Univ, Dept Comp Sci, Huizhou 516007, Peoples R China
基金
中国国家自然科学基金;
关键词
One-class classification; Filter-based feature selection; Sub-concept; Multimodal data; Outlier detection; Cyber security; DATA COMPLEXITY;
D O I
10.1007/s00500-020-04828-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Similarly to binary classification methods, one-class classification methods could benefit from feature selection. However, the feature selection algorithms for the binary or multi-class are not applicable to one-class classification situations since only one class of instances is provided. Few techniques have been proposed so far for feature selection in one-class classification. This paper focuses on designing a filter-based feature selection method for one-class classification. Our approach is based on the observation that for some tasks such as outlier detection, anomaly detection, the training data (normal data) may contain multiple sub-concepts. The sub-concept is a source of data complexity. Our approach aims at searching the features that characterize the instances of the sub-concepts more compact, so as to reduce the data complexity. It firstly finds the sub-concepts using a clustering algorithm with a fixed cluster number and then applies combined feature measures to evaluate the relevance between each feature and the sub-concepts. A fixed number of features-those with the highest relevance scores-are selected as a feature subset. In the searching process, the Davies-Bouldin Index is used to assess the data complexity on the sub-concepts obtained with different number of clusters. The feature subset with the lowest DBI is selected as the final feature subset. Experiments on UCI benchmark and cyber security datasets demonstrate that our feature selection algorithm can select relevant features and improve the performance of one-class classification on multimodal data.
引用
收藏
页码:7047 / 7062
页数:16
相关论文
共 33 条
  • [1] One-class classification - From theory to practice: A case-study in radioactive threat detection
    Bellinger, Colin
    Sharma, Shiven
    Japkowicz, Nathalie
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 108 : 223 - 232
  • [2] Analysis of data complexity measures for classification
    Cano, Jose-Ramon
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (12) : 4820 - 4831
  • [3] A survey on feature selection methods
    Chandrashekar, Girish
    Sahin, Ferat
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28
  • [4] Creech G, 2013, 2013 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), P4487
  • [5] CLUSTER SEPARATION MEASURE
    DAVIES, DL
    BOULDIN, DW
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) : 224 - 227
  • [6] MINAS: multiclass learning algorithm for novelty detection in data streams
    de Faria, Elaine Ribeiro
    de Leon Ferreira Carvalho, Andre Carlos Ponce
    Gama, Joao
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (03) : 640 - 680
  • [7] Threaded ensembles of autoencoders for stream learning
    Dong, Yue
    Japkowicz, Nathalie
    [J]. COMPUTATIONAL INTELLIGENCE, 2018, 34 (01) : 261 - 281
  • [8] High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning
    Erfani, Sarah M.
    Rajasegarar, Sutharshan
    Karunasekera, Shanika
    Leckie, Christopher
    [J]. PATTERN RECOGNITION, 2016, 58 : 121 - 134
  • [9] Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling
    Haider, W.
    Hu, J.
    Slay, J.
    Turnbull, B. P.
    Xie, Y.
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2017, 87 : 185 - 192
  • [10] Differential evolution for feature selection: a fuzzy wrapper-filter approach
    Hancer, Emrah
    [J]. SOFT COMPUTING, 2019, 23 (13) : 5233 - 5248