Half-Quadratic Minimization for Unsupervised Feature Selection on Incomplete Data

被引:61
作者
Shen, Heng Tao [1 ,2 ]
Zhu, Yonghua [3 ]
Zheng, Wei [3 ]
Zhu, Xiaofeng [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Technol, Chengdu 611731, Peoples R China
[3] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Data models; Minimization; Data analysis; Robustness; Analytical models; Machine learning; Feature selection; half-quadratic minimization; incomplete data; robust statistics; sparse learning;
D O I
10.1109/TNNLS.2020.3009632
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised feature selection (UFS) is a popular technique of reducing the dimensions of high-dimensional data. Previous UFS methods were often designed with the assumption that the whole information in the data set is observed. However, incomplete data sets that contain unobserved information can be often found in real applications, especially in industry. Thus, these existing UFS methods have a limitation on conducting feature selection on incomplete data. On the other hand, most existing UFS methods did not consider the sample importance for feature selection, i.e., different samples have various importance. As a result, the constructed UFS models easily suffer from the influence of outliers. This article investigates a new UFS method for conducting UFS on incomplete data sets to investigate the abovementioned issues. Specifically, the proposed method deals with unobserved information by using an indicator matrix to filter it out the process of feature selection and reduces the influence of outliers by employing the half-quadratic minimization technique to automatically assigning outliers with small or even zero weights and important samples with large weights. This article further designs an alternative optimization strategy to optimize the proposed objective function as well as theoretically and experimentally prove the convergence of the proposed optimization strategy. Experimental results on both real and synthetic incomplete data sets verified the effectiveness of the proposed method compared with previous methods, in terms of clustering performance on the low-dimensional space of the high-dimensional data.
引用
收藏
页码:3122 / 3135
页数:14
相关论文
共 43 条
[1]   Semi-Supervised Discriminative Classification Robust to Sample-Outliers and Feature-Noises [J].
Adeli, Ehsan ;
Thung, Kim-Han ;
An, Le ;
Wu, Guorong ;
Shi, Feng ;
Wang, Tao ;
Shen, Dinggang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :515-522
[2]  
[Anonymous], 2009, Advances in neural information processing systems
[3]  
Bengio S., 2009, Advances in Neural Information Processing Systems, V22, P82
[4]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[5]   Iteratively Reweighted Least Squares Minimization for Sparse Recovery [J].
Daubechies, Ingrid ;
Devore, Ronald ;
Fornasier, Massimo ;
Guentuerk, C. Sinan .
COMMUNICATIONS ON PURE AND APPLIED MATHEMATICS, 2010, 63 (01) :1-38
[6]   Feature selection with missing data using mutual information estimators [J].
Doquire, Gauthier ;
Verleysen, Michel .
NEUROCOMPUTING, 2012, 90 :3-11
[7]  
Fan YB, 2017, AAAI CONF ARTIF INTE, P1877
[8]  
Ganan S., AM STAT ASS, P12
[9]  
Geman S., 1987, B INT STAT I, V52, P5
[10]   Half-Quadratic-Based Iterative Minimization for Robust Sparse Representation [J].
He, Ran ;
Zheng, Wei-Shi ;
Tan, Tieniu ;
Sun, Zhenan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (02) :261-275