Flexible density peak clustering for real-world data

被引:2
|
作者
Hou, Jian [1 ]
Lin, Houshen [1 ]
Yuan, Huaqiang [1 ]
Pelillo, Marcello [2 ,3 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523808, Peoples R China
[2] Ca Foscari Univ, DAIS, I-30172 Venice, Italy
[3] Ca Foscari Univ, European Ctr Living Technol, I-30123 Venice, Italy
基金
中国国家自然科学基金;
关键词
Clustering; Density peak; Real-world data; Number of clusters; FAST SEARCH; K-MEANS; FIND;
D O I
10.1016/j.patcog.2024.110772
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In density based clustering, the density peak algorithm has attracted much attention due to its effectiveness and simplicity, and a vast amount of clustering approaches have been proposed based on this algorithm. Some of these works require manual selection of cluster centers with a decision graph, where human involvement leads to uncertainty in clustering results. In order to avoid human involvement, some other algorithms depend on user-specified number of clusters to determine cluster centers automatically. However, it is well known that accurate estimation of number of clusters is a long-standing difficulty in data clustering. In this paper we present a sequential density peak clustering algorithm to extract clusters one by one, thereby determining the number of clusters automatically and avoiding manual selection of cluster centers in the meanwhile. Starting from a density peak, our algorithm generates an initial cluster surrounding the density peak in the first step, and then obtains the final cluster by expanding the initial cluster based on the relative density relationship among neighboring data points. With a peeling-off strategy, we obtain all the clusters sequentially. Our algorithm works well with clusters of Gaussian distribution and is therefore potential for clustering of real-world data. Experiments with a large number of synthetic and real datasets and comparisons with existing algorithms demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] An Improved Density Peak Clustering Algorithm for Multi-Density Data
    Yin, Lifeng
    Wang, Yingfeng
    Chen, Huayue
    Deng, Wu
    SENSORS, 2022, 22 (22)
  • [22] An empirical study for density peak clustering
    Viet-Vu Vu
    Byeongnam Yoon
    Hong-Quan Do
    Hai-Minh Nguyen
    Tran-Chung Dao
    Cong-Mau Tran
    Doan-Vinh Tran
    2022 24TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ARITIFLCIAL INTELLIGENCE TECHNOLOGIES TOWARD CYBERSECURITY, 2022, : 365 - +
  • [23] Using real-world data for coverage and payment decisions: The ISPOR real-world data task force report
    Garrison, Louis P., Jr.
    Neumann, Peter J.
    Erickson, Pennifer
    Marshall, Deborah
    Mullins, Daniel
    VALUE IN HEALTH, 2007, 10 (05) : 326 - 335
  • [24] A Data-Driven Parameter Adaptive Clustering Algorithm Based on Density Peak
    Du, Tao
    Qu, Shouning
    Wang, Qin
    COMPLEXITY, 2018,
  • [25] Neighbor-Relationship-Based Adaptive Density Peak Clustering
    Su, Zhigang
    Gao, Qian
    Hao, Jingtang
    Wang, Yue
    Han, Bing
    IEEE ACCESS, 2024, 12 : 192415 - 192439
  • [26] Data Pre-Processing for Real-World E-Commerce Delivery Address Clustering
    Zhang, Yuan
    PROCEEDINGS OF THE 2017 INTERNATIONAL SEMINAR ON ARTIFICIAL INTELLIGENCE, NETWORKING AND INFORMATION TECHNOLOGY (ANIT 2017), 2017, 150 : 164 - 168
  • [27] Evaluating the Quality of Real-World Data on Adherence to Oral Endocrine Therapy in Breast Cancer Patients: How Real Is Real-World Data?
    Navarro-Sabate, A.
    Font, R.
    Espinas, J. A.
    Sola, J.
    Martinez-Soler, F.
    Gil-Gil, M.
    Vinas, G.
    Tibau, A.
    Borrell, M.
    Segui, M.
    Margeli, M.
    Servitja, S.
    Perez, C.
    Domenech, M.
    Nava, M.
    Marin, M.
    Gonzalez, S.
    Borras, J. M.
    CANCERS, 2025, 17 (02)
  • [28] Hybrid Clustering Algorithm Based on Improved Density Peak Clustering
    Guo, Limin
    Qin, Weijia
    Cai, Zhi
    Su, Xing
    APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [29] Data stream clustering by fast density-peak-search
    Su, Jinxia
    Li, Yanwen
    Zhao, Xuejing
    STATISTICS AND ITS INTERFACE, 2018, 11 (01) : 183 - 189
  • [30] ConDPC: Data Connectivity-Based Density Peak Clustering
    Zou, Yujuan
    Wang, Zhijian
    APPLIED SCIENCES-BASEL, 2022, 12 (24):