Sparse PCA via l2,p-Norm Regularization for Unsupervised Feature Selection

被引:48
作者
Li, Zhengxin [1 ,2 ,3 ]
Nie, Feiping [2 ,3 ]
Bian, Jintang [2 ,3 ]
Wu, Danyang [2 ,3 ]
Li, Xuelong [2 ,3 ]
机构
[1] Air Force Engn Univ, Coll Equipment Management & UAV Engn, Xian 710051, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[3] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised feature selection; principal component analysis; l(2; p)-norm; sparse learning; CLASSIFICATION;
D O I
10.1109/TPAMI.2021.3121329
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of data mining, how to deal with high-dimensional data is an inevitable topic. Since it does not rely on labels, unsupervised feature selection has attracted a lot of attention. The performance of spectral-based unsupervised methods depends on the quality of the constructed similarity matrix, which is used to depict the intrinsic structure of data. However, real-world data often contain plenty of noise features, making the similarity matrix constructed by original data cannot be completely reliable. Worse still, the size of a similarity matrix expands rapidly as the number of samples rises, making the computational cost increase significantly. To solve this problem, a simple and efficient unsupervised model is proposed to perform feature selection. We formulate PCA as a reconstruction error minimization problem, and incorporate a l(2,p)-norm regularization term to make the projection matrix sparse. The learned row-sparse and orthogonal projection matrix is used to select discriminative features. Then, we present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically. Finally, experiments on both synthetic and real-world data sets demonstrate the effectiveness of our proposed method.
引用
收藏
页码:5322 / 5328
页数:7
相关论文
共 27 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]  
[Anonymous], 1998, TECH REP
[3]  
Cai D., 2010, P 16 ACM SIGKDD INT, P333
[4]   Convex Sparse PCA for Unsupervised Feature Learning [J].
Chang, Xiaojun ;
Nie, Feiping ;
Yang, Yi ;
Zhang, Chengqi ;
Huang, Heng .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (01)
[5]   Petrophysics and mineral exploration: a workflow for data analysis and a new interpretation framework [J].
Dentith, Michael ;
Enkin, Randolph J. ;
Morris, William ;
Adams, Cameron ;
Bourne, Barry .
GEOPHYSICAL PROSPECTING, 2020, 68 (01) :178-199
[6]  
Guo J, 2018, AAAI CONF ARTIF INTE, P2232
[7]  
He X., 2005, Adv Neural Inf Proc Syst, V18, P507
[8]   Recursive Nearest Agglomeration (ReNA): Fast Clustering for Approximation of Structured Signals [J].
Hoyos-Idrobo, Andres ;
Varoquaux, Gael ;
Kahn, Jonas ;
Thirion, Bertrand .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) :669-681
[9]   Approximate Sparse Multinomial Logistic Regression for Classification [J].
Kayabol, Koray .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) :490-493
[10]   Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks [J].
Khan, J ;
Wei, JS ;
Ringnér, M ;
Saal, LH ;
Ladanyi, M ;
Westermann, F ;
Berthold, F ;
Schwab, M ;
Antonescu, CR ;
Peterson, C ;
Meltzer, PS .
NATURE MEDICINE, 2001, 7 (06) :673-679