GOLFS: feature selection via combining both global and local information for high dimensional clustering

被引:0
作者
Xing, Zhaoyu [1 ]
Wan, Yang [3 ]
Wen, Juan [2 ]
Zhong, Wei [2 ]
机构
[1] Xiamen Univ, Paula & Gregory Chow Inst Studies Econ, Xiamen, Peoples R China
[2] Xiamen Univ, Sch Econ, Dept Stat & Data Sci, Xiamen, Peoples R China
[3] ByteDance Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; High dimensionality; l(2,1)-norm; Manifold learning; Regularized self-representation; Spectral clustering; UNSUPERVISED FEATURE-SELECTION; VARIABLE SELECTION; ALGORITHMS; REGRESSION; OBJECTS;
D O I
10.1007/s00180-023-01393-x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn the pseudo labels and select the discriminative features simultaneously, we propose a new unsupervised feature selection method, named GlObal and Local information combined Feature Selection (GOLFS), for high dimensional clustering problems. The GOLFS algorithm combines both local geometric structure via manifold learning and global correlation structure of samples via regularized self-representation to select the discriminative features. The combination improves the accuracy of both feature selection and clustering by exploiting more comprehensive information. In addition, an iterative algorithm is proposed to solve the optimization problem and the convergency is proved. Simulations and two real data applications demonstrate the excellent finite-sample performance of GOLFS on both feature selection and clustering.
引用
收藏
页码:2651 / 2675
页数:25
相关论文
共 50 条
[1]  
Aggarwal CC, 2012, Mining text data, P77
[2]  
Bernardo JM, 2003, BAYESIAN STAT 7, V249
[3]   A unifying criterion for unsupervised clustering and feature selection [J].
Breaban, Mihaela ;
Luchian, Henri .
PATTERN RECOGNITION, 2011, 44 (04) :854-865
[4]  
Cai D., 2010, P 16 ACM SIGKDD INT, P333, DOI DOI 10.1145/1835804.1835848
[5]   Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation [J].
Cai, Weiling ;
Chen, Songean ;
Zhang, Daoqiang .
PATTERN RECOGNITION, 2007, 40 (03) :825-838
[6]   Parallel Spectral Clustering in Distributed Systems [J].
Chen, Wen-Yen ;
Song, Yangqiu ;
Bai, Hongjie ;
Lin, Chih-Jen ;
Chang, Edward Y. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) :568-586
[7]   Model-Free Feature Screening for Ultrahigh Dimenssional Discriminant Analysis [J].
Cui, Hengjian ;
Li, Runze ;
Zhong, Wei .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (510) :630-641
[8]   Unsupervised Feature Selection with Adaptive Structure Learning [J].
Du, Liang ;
Shen, Yi-Dong .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :209-218
[9]  
Dy JG, 2004, J MACH LEARN RES, V5, P845
[10]   A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects [J].
Ezugwu, Absalom E. ;
Ikotun, Abiodun M. ;
Oyelade, Olaide O. ;
Abualigah, Laith ;
Agushaka, Jeffery O. ;
Eke, Christopher I. ;
Akinyelu, Andronicus A. .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 110