Survey on Density Peak Clustering Algorithm

被引:0
|
作者
Chen Y. [1 ,2 ,3 ,4 ]
Shen L. [1 ]
Zhong C. [5 ]
Wang T. [1 ]
Chen Y. [1 ,2 ,3 ,4 ]
Du J. [1 ]
机构
[1] College of Computer Science and Technology, Huaqiao University, Xiamen, 361021, Fujian
[2] Beijing Key Laboratory of Big Data Technology for Food Safety (Beijing Technology and Business University), Beijing
[3] Provincial Key Laboratory for Computer Information Processing Technology (Soochow University), Suzhou, 215006, Jiangsu
[4] Fujian Key Laboratory of Big Data Intelligence and Security (Huaqiao University), Xiamen, 361021, Fujian
[5] College of Information, Ningbo University, Ningbo, 315211, Zhejiang
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2020年 / 57卷 / 02期
基金
中国国家自然科学基金;
关键词
Big data; Clustering algorithm; Data mining; Density clustering; Density peak;
D O I
10.7544/issn1000-1239.2020.20190104
中图分类号
学科分类号
摘要
DPeak(density peak) is a simple but effective clustering method. It is able to map data with arbitrary dimension onto a 2-dimensional space, and construct hierarchical relationship for all data points on the new reduction space. This makes it is easy to pick up some distinguished points (density peaks), each of which has high density and large distance from other regions of higher density. In addition, based on regarding theses density peaks as cluster centers and the hierarchical relationship, the algorithm provides two different ways to perform the final task of clustering, i.e., one is decision diagram that can interact with users, and the other is an automatic method. In this paper, we trace the development and application trends of DPeak in recent years, summarize and comb various improvements or variations of DPeak algorithm from the following aspects. Firstly, the principle of DPeak algorithm is introduced, and its position in the classification system of clustering algorithm is discussed as well. After comparing DPeak with several other main clustering algorithms, it is found that DPeak is highly similar to mean shift, and hence, we think that DPeak may be a special variant of mean shift. Secondly, some shortcomings of DPeak are discussed, such as high time complexity, lack of adaptability, low precision and inefficiency in high dimensional space etc., and then various improved algorithms are demonstrated in different categories. In addition, some applications of DPeak in different fields, such as natural language processing, biomedical analysis and optical applications etc., are presented and combed. Last but not least, we look forward to its future work based on the problems and challenges of the DPeak. © 2020, Science Press. All right reserved.
引用
收藏
页码:378 / 394
页数:16
相关论文
共 104 条
  • [1] Hilbert M., Lopez P., The world's technological capacity to store, communicate, and compute information, Science, 332, 6025, pp. 60-65, (2011)
  • [2] Rosenberger C., Chehdi K., Unsupervised clustering method with optimal estimation of the number of clusters: Application to image segmentation, Proc of the 15th Int Conf on Pattern Recognition, pp. 656-659, (2000)
  • [3] Frigui H., Krishnapuram R., A robust competitive clustering algorithm with applications in computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 5, pp. 450-465, (1999)
  • [4] Achanta R., Shaji A., Smith K., Et al., SLIC superpixels compared to state-of-the-art superpixel methods, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 11, pp. 2274-2282, (2012)
  • [5] Elhamifar E., Vidal R., Sparse subspace clustering, Proc of 2009 IEEE Conf on Computer Vision and Pattern Recognition, pp. 2790-2797, (2009)
  • [6] Li W., Godzik A., Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, 22, 13, pp. 1658-1659, (2006)
  • [7] King A.D., Przulj N., Jurisica I., Protein complex prediction via cost-based clustering, Bioinformatics, 20, 17, pp. 3013-3020, (2004)
  • [8] Huang D., Sherman B.T., Lempicki R.A., Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources, Nature Protocols, 4, 1, pp. 44-57, (2009)
  • [9] Rossant C., Kadir S.N., Goodman D.F.M., Et al., Spike sorting for large, dense electrode arrays, Nature Neuroscience, 19, 4, pp. 634-641, (2016)
  • [10] Moosmann F., Nowak E., Jurie F., Randomized clustering forests for image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 9, pp. 1632-1646, (2008)