Interactive information bottleneck for high-dimensional co-occurrence data clustering

被引:5
作者
Hu, Shizhe [1 ]
Wang, Ruobin [1 ]
Ye, Yangdong [1 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; High-dimensional data; Information bottleneck; MIXTURE MODEL; FEATURE-SELECTION;
D O I
10.1016/j.asoc.2021.107837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data clustering on the low-dimensional representations. However, the relationships between the clustered data points and the dimension-reduced features, as well as the influence of the relationships on the low-dimensional feature subspace learning are neglected in these methods. In this paper, an embarrassingly simple yet effective interactive information bottleneck (IIB) method is proposed for high-dimensional co-occurrence data clustering by simultaneously performing data clustering and low-dimensional feature subspace learning. What is different from existing methods is that, we perform data clustering while maximally preserving the correlations between the data clusters and the learned dimension-reduced features, and simultaneously learn the low-dimensional feature subspace while maintaining the correlations with the data clustering results obtained in the previous iteration. Thus, the two stages are interactive and refined mutually. Finally, a new twin "draw-and-merge" method is designed for optimization. Experimental results on four high-dimensional datasets demonstrate the superiority and effectiveness of the proposed method. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 60 条
  • [1] Alemi A. A., 2017, ICLR, P1, DOI DOI 10.48550/ARXIV.1612.00410
  • [2] Simultaneous Spectral Data Embedding and Clustering
    Allab, Kais
    Labiod, Lazhar
    Nadif, Mohamed
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 6396 - 6401
  • [3] Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle
    Amjad, Rana Ali
    Geiger, Bernhard C.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (09) : 2225 - 2239
  • [4] [Anonymous], 2011, NIPS
  • [5] A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection
    Bouguila, Nizar
    Almakadmeh, Khaled
    Boutemedjet, Sabri
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (07) : 6641 - 6656
  • [6] Locally Consistent Concept Factorization for Document Clustering
    Cai, Deng
    He, Xiaofei
    Han, Jiawei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (06) : 902 - 913
  • [7] Spectral Clustering by Subspace Randomization and Graph Fusion for High-Dimensional Data
    Cai, Xiaosha
    Huang, Dong
    Wang, Chang-Dong
    Kwoh, Chee-Keong
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 330 - 342
  • [8] Chen JF, 2021, NEUROCOMPUTING, V421, P316
  • [9] Cover TM., 1991, ELEMENTS INFORM THEO
  • [10] Dai B, 2018, PR MACH LEARN RES, V80