Incremental clustering based on Wasserstein distance between histogram models

被引:0
|
作者
Qian, Xiaotong [1 ]
Cabanes, Guenael [2 ]
Rastin, Parisa [2 ]
Guidani, Mohamed Alae [3 ]
Marrakchi, Ghassen [4 ]
Clausel, Marianne [2 ]
Grozavu, Nistor [1 ]
机构
[1] CY Cergy Paris Univ, ETIS, UMR 8051, F-95000 Cergy, France
[2] Univ Lorraine, LORIA, UMR 7503, F-54500 Vandoeuvr Les Nancy, France
[3] Ecole Natl Super Mines, Campus Artem, F-54042 Nancy, France
[4] Univ Sorbonne Paris Nord, LIPN, UMR 7030, F-93430 Villetaneuse, France
关键词
Unsupervised learning; Static and dynamic clustering; Large datasets; Data streams; Sliding windows; Histogram models; Wasserstein distance; STREAMING-DATA; CLASSIFIER; ALGORITHMS;
D O I
10.1016/j.patcog.2025.111414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Local Histogram Based Segmentation Using the Wasserstein Distance
    Ni, Kangyu
    Bresson, Xavier
    Chan, Tony
    Esedoglu, Selim
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2009, 84 (01) : 97 - 111
  • [2] Local Histogram Based Segmentation Using the Wasserstein Distance
    Kangyu Ni
    Xavier Bresson
    Tony Chan
    Selim Esedoglu
    International Journal of Computer Vision, 2009, 84 : 97 - 111
  • [3] Ordinary Least Squares for Histogram Data Based on Wasserstein Distance
    Verde, Rosanna
    Irpino, Antonio
    COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 581 - 588
  • [4] Dynamic clustering of histogram data based on adaptive squared Wasserstein distances
    Irpino, Antonio
    Verde, Rosanna
    De Carvalho, Francisco de A. T.
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (07) : 3351 - 3366
  • [5] A Wasserstein distance-based spectral clustering method for transaction data analysis
    Zhu, Yingqiu
    Huang, Danyang
    Zhang, Bo
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 260
  • [6] DECWA : Density-Based Clustering using Wasserstein Distance
    El Malki, Nabil
    Cugny, Robin
    Teste, Olivier
    Ravat, Franck
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2005 - 2008
  • [7] Evaluating the Performance of Climate Models Based on Wasserstein Distance
    Vissio, Gabriele
    Lembo, Valerio
    Lucarini, Valerio
    Ghil, Michael
    GEOPHYSICAL RESEARCH LETTERS, 2020, 47 (21)
  • [8] Wasserstein distance to independence models
    Celik, Turku Ozluem
    Jamneshan, Asgar
    Montufar, Guido
    Sturmfels, Bernd
    Venturello, Lorenzo
    JOURNAL OF SYMBOLIC COMPUTATION, 2021, 104 : 855 - 873
  • [9] Dynamic clustering of interval data using a Wasserstein-based distance
    Irpino, Antonio
    Verde, Rosanna
    PATTERN RECOGNITION LETTERS, 2008, 29 (11) : 1648 - 1658
  • [10] Comparing histogram data using a Mahalanobis-Wasserstein distance
    Verde, Rosanna
    Irpino, Antonio
    COMPSTAT 2008: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2008, : 77 - 89