Incremental clustering based on Wasserstein distance between histogram models

被引:0
|
作者
Qian, Xiaotong [1 ]
Cabanes, Guenael [2 ]
Rastin, Parisa [2 ]
Guidani, Mohamed Alae [3 ]
Marrakchi, Ghassen [4 ]
Clausel, Marianne [2 ]
Grozavu, Nistor [1 ]
机构
[1] CY Cergy Paris Univ, ETIS, UMR 8051, F-95000 Cergy, France
[2] Univ Lorraine, LORIA, UMR 7503, F-54500 Vandoeuvr Les Nancy, France
[3] Ecole Natl Super Mines, Campus Artem, F-54042 Nancy, France
[4] Univ Sorbonne Paris Nord, LIPN, UMR 7030, F-93430 Villetaneuse, France
关键词
Unsupervised learning; Static and dynamic clustering; Large datasets; Data streams; Sliding windows; Histogram models; Wasserstein distance; STREAMING-DATA; CLASSIFIER; ALGORITHMS;
D O I
10.1016/j.patcog.2025.111414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we present an innovative clustering framework designed for large datasets and real-time data streams which uses a sliding window and histogram model to address the challenge of memory congestion while reducing computational complexity and improving cluster quality for both static and dynamic clustering. The framework provides a simple way to characterize the probability distribution of cluster distributions through histogram models, regardless of their distribution type. This advantage allows for efficient use with various conventional clustering algorithms. To facilitate effective clustering across windows, we use a statistical measure that allows the comparison and merging of different clusters based on the calculation of the Wasserstein distance between histograms.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] COMPARISON BETWEEN W2 DISTANCE AND H-1 NORM, AND LOCALIZATION OF WASSERSTEIN DISTANCE
    Peyre, Remi
    ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2018, 24 (04) : 1489 - 1501
  • [42] Global Pose Initialization Based on Gridded Gaussian Distribution With Wasserstein Distance
    Yang, Chenxi
    Zhou, Zhibo
    Zhuang, Hanyang
    Wang, Chunxiang
    Yang, Ming
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (05) : 5094 - 5104
  • [43] Multiple Voltage Sag Events Homology Detection Based on Wasserstein Distance
    Xiao X.
    Gui L.
    Li C.
    Zhang H.
    Li H.
    Wang Q.
    Dianwang Jishu/Power System Technology, 2020, 44 (12): : 4684 - 4693
  • [44] Behavior of the Wasserstein distance between the empirical and the marginal distributions of stationary α-dependent sequences
    Dedecker, Jerome
    Merlevede, Florence
    BERNOULLI, 2017, 23 (03) : 2083 - 2127
  • [45] A Wasserstein distance-based technique for the evaluation of GNSS error characterization
    Chen, Jinpei
    Zhang, Wenyu
    Feng, Bingqing
    Zhi, Nan
    Zhao, Yi
    Lu, Mingquan
    GPS SOLUTIONS, 2024, 28 (02)
  • [46] A new procedure for testing normality based on the L 2 Wasserstein distance
    He Daojiang
    Xu Xingzhong
    Zhao Jianxin
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2013, 26 (04) : 572 - 582
  • [47] Generative adversarial networks based on Wasserstein distance for knowledge graph embeddings
    Dai, Yuanfei
    Wang, Shiping
    Chen, Xing
    Xu, Chaoyang
    Guo, Wenzhong
    KNOWLEDGE-BASED SYSTEMS, 2020, 190
  • [48] Cross-Domain Text Sentiment Classification Based on Wasserstein Distance
    Cai, Guoyong
    Lin, Qiang
    Chen, Nannan
    SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 280 - 291
  • [49] An Integrated Method Based on Wasserstein Distance and Graph for Cancer Subtype Discovery
    Cao, Qingqing
    Zhao, Jianping
    Wang, Haiyun
    Guan, Qi
    Zheng, Chunhou
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (06) : 3499 - 3510
  • [50] A Wasserstein distance-based technique for the evaluation of GNSS error characterization
    Jinpei Chen
    Wenyu Zhang
    Bingqing Feng
    Nan Zhi
    Yi Zhao
    Mingquan Lu
    GPS Solutions, 2024, 28