Clustering with Missing Features: A Density-Based Approach

被引:10
作者
Gao, Kun [1 ]
Khan, Hassan Ali [1 ]
Qu, Wenwen [1 ]
机构
[1] East China Normal Univ, Sch Software Engn, Shanghai 200062, Peoples R China
来源
SYMMETRY-BASEL | 2022年 / 14卷 / 01期
基金
中国国家自然科学基金;
关键词
clustering; incomplete data; density peak; imputation; K-NEAREST NEIGHBORS; IMPUTATION; ALGORITHM;
D O I
10.3390/sym14010060
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Density clustering has been widely used in many research disciplines to determine the structure of real-world datasets. Existing density clustering algorithms only work well on complete datasets. In real-world datasets, however, there may be missing feature values due to technical limitations. Many imputation methods used for density clustering cause the aggregation phenomenon. To solve this problem, a two-stage novel density peak clustering approach with missing features is proposed: First, the density peak clustering algorithm is used for the data with complete features, while the labeled core points that can represent the whole data distribution are used to train the classifier. Second, we calculate a symmetrical FWPD distance matrix for incomplete data points, then the incomplete data are imputed by the symmetrical FWPD distance matrix and classified by the classifier. The experimental results show that the proposed approach performs well on both synthetic datasets and real datasets.
引用
收藏
页数:16
相关论文
共 36 条
[1]  
Ankerst M., 2008, PROC ACM SIGMOD, V99
[2]  
[Anonymous], 2007, DATA CLUSTERING THEO
[3]  
Bishop C. M., 1995, Neural Networks for Pattern Recognition
[4]   A Novel Density Peak Fuzzy Clustering Algorithm for Moving Vehicles Using Traffic Radar [J].
Cao, Lin ;
Liu, Yunxiao ;
Wang, Dongfeng ;
Wang, Tao ;
Fu, Chong .
ELECTRONICS, 2020, 9 (01)
[5]   Fast density peak clustering for large scale data based on kNN [J].
Chen, Yewang ;
Hu, Xiaoliang ;
Fan, Wentao ;
Shen, Lianlian ;
Zhang, Zheng ;
Liu, Xin ;
Du, Jixiang ;
Li, Haibo ;
Chen, Yi ;
Li, Hailin .
KNOWLEDGE-BASED SYSTEMS, 2020, 187
[6]   Clustering with missing features: a penalized dissimilarity measure based approach [J].
Datta, Shounak ;
Bhattacharjee, Supritam ;
Das, Swagatam .
MACHINE LEARNING, 2018, 107 (12) :1987-2025
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   Clustering mixed numerical and categorical data with missing values [J].
Dinh, Duy-Tai ;
Huynh, Van-Nam ;
Sriboonchitta, Songsak .
INFORMATION SCIENCES, 2021, 571 :418-442
[9]   PATTERN-RECOGNITION WITH PARTLY MISSING DATA [J].
DIXON, JK .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1979, 9 (10) :617-621
[10]   Review: A gentle introduction to imputation of missing values [J].
Donders, A. Rogier T. ;
van der Heijden, Geert J. M. G. ;
Stijnen, Theo ;
Moons, Karel G. M. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2006, 59 (10) :1087-1091