A Framework for Clustering Categorical Time-Evolving Data

被引:43
|
作者
Cao, Fuyuan [1 ]
Liang, Jiye [1 ]
Bai, Liang [1 ]
Zhao, Xingwang [1 ]
Dang, Chuangyin [2 ]
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Peoples R China
[2] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Categorical time-evolving data; clusters relationship analysis; data labeling; drifting-concept detecting; K-MEANS ALGORITHM; ROUGH; UNCERTAINTY; REDUCTION; SETS;
D O I
10.1109/TFUZZ.2010.2050891
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A fundamental assumption often made in unsupervised learning is that the problem is static, i.e., the description of the classes does not change with time. However, many practical clustering tasks involve changing environments. It is hence recognized that the methods and techniques to analyze the evolving trends for changing environments are of increasing interest and importance. Although the problem of clustering numerical time-evolving data is well-explored, the problem of clustering categorical time-evolving data remains as a challenging issue. In this paper, we propose a generalized clustering framework for categorical time-evolving data, which is composed of three algorithms: a drifting-concept detecting algorithm that detects the difference between the current sliding window and the last sliding window, a data-labeling algorithm that decides the most-appropriate cluster label for each object of the current sliding window based on the clustering results of the last sliding window, and a cluster-relationship-analysis algorithm that analyzes the relationship between clustering results at different time stamps. The time-complexity analysis indicates that these proposed algorithms are effective for large datasets. Experiments on a real dataset show that the proposed framework not only accurately detects the drifting concepts but also attains clustering results of better quality. Furthermore, compared with the other framework, the proposed one needs fewer parameters, which is favorable for specific applications.
引用
收藏
页码:872 / 882
页数:11
相关论文
共 50 条
  • [1] Probabilistic clustering of time-evolving distance data
    Vogt, Julia E.
    Kloft, Marius
    Stark, Stefan
    Raman, Sudhir S.
    Prabhakaran, Sandhya
    Roth, Volker
    Raetsch, Gunnar
    MACHINE LEARNING, 2015, 100 (2-3) : 635 - 654
  • [2] Probabilistic clustering of time-evolving distance data
    Julia E. Vogt
    Marius Kloft
    Stefan Stark
    Sudhir S. Raman
    Sandhya Prabhakaran
    Volker Roth
    Gunnar Rätsch
    Machine Learning, 2015, 100 : 635 - 654
  • [3] Learning on time-evolving data
    Zhang, Chang-Shui
    Zhang, Jian-Wen
    Jisuanji Xuebao/Chinese Journal of Computers, 2013, 36 (02): : 310 - 316
  • [4] Local Motif Clustering on Time-Evolving Graphs
    Fu, Dongqi
    Zhou, Dawei
    He, Jingrui
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 390 - 400
  • [5] On the generation of time-evolving regional data
    Tzouramanis, T
    Vassilakopoulos, M
    Manolopoulos, Y
    GEOINFORMATICA, 2002, 6 (03) : 207 - 231
  • [6] On the Generation of Time-Evolving Regional Data*
    Theodoros Tzouramanis
    Michael Vassilakopoulos
    Yannis Manolopoulos
    GeoInformatica, 2002, 6 : 207 - 231
  • [7] AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data
    Gao, Zheng
    Guo, Lin
    Ma, Chi
    Ma, Xiao
    Sun, Kai
    Xiang, Hang
    Zhu, Xiaoqiang
    Li, Hongsong
    Liu, Xiaozhong
    1ST INTERNATIONAL WORKSHOP ON DEEP LEARNING PRACTICE FOR HIGH-DIMENSIONAL SPARSE DATA WITH KDD (DLP-KDD 2019), 2019,
  • [8] Clustering of time-evolving scaling dynamics in a complex signal
    Saghir, Hamidreza
    Chau, Tom
    Kushki, Azadeh
    PHYSICAL REVIEW E, 2016, 94 (01)
  • [9] Visualising Time-evolving Semantic Biomedical Data
    Pereira, Arnaldo
    Rafael Almeida, Joao
    Lopes, Rui Pedro
    Oliveira, Jose Luis
    2022 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2022, : 264 - 269
  • [10] Comparison of access methods for time-evolving data
    Salzberg, B
    Tsotras, VJ
    ACM COMPUTING SURVEYS, 1999, 31 (02) : 158 - 221