Deep clustering of tabular data by weighted Gaussian distribution learning

被引:0
|
作者
Rabbani, Shourav B. [1 ]
Medri, Ivan, V
Samad, Manar D. [1 ]
机构
[1] Tennessee State Univ, Dept Comp Sci, 3500 John A Merritt Blvd, Nashville, TN 37209 USA
关键词
Tabular data; Deep clustering; Embedding clustering; Multivariate Gaussian; Autoencoder; AUTOENCODER;
D O I
10.1016/j.neucom.2025.129359
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods are primarily proposed for supervised learning of images or text with limited applications to clustering problems. In contrast, tabular data with heterogeneous feature space pose unique challenges in representation learning, where deep learning has yet to replace traditional machine learning. This paper addresses these challenges by developing one of the first deep clustering methods for tabular data, Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS). G-CEALS is an unsupervised deep clustering framework for learning the parameters of multivariate Gaussian cluster distributions by iteratively updating individual cluster weights. The G-CEALS method presents average rank orderings of 2.9 (1.7) and 2.8 (1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen tabular data sets, respectively, and outperforms nine state-of-the-art clustering methods. G-CEALS substantially improves clustering performance compared to traditional K-means and GMM, which are still de facto methods for clustering tabular data. Similar computationally efficient and high-performing deep clustering frameworks are imperative to reap the myriad benefits of deep learning on tabular data over traditional machine learning.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Revisiting Deep Learning Models for Tabular Data
    Gorishniy, Yury
    Rubachev, Ivan
    Khrulkov, Valentin
    Babenko, Artem
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] LocalGLMnet: interpretable deep learning for tabular data
    Richman, Ronald
    Wuethrich, Mario, V
    SCANDINAVIAN ACTUARIAL JOURNAL, 2023, 2023 (01) : 71 - 95
  • [3] Recent deep learning methods for tabular data
    Hwang, Yejin
    Song, Jongwoo
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2023, 30 (02) : 215 - 226
  • [4] Is Deep Learning on Tabular Data Enough? An Assessment
    Fayaz, Sheikh Amir
    Zaman, Majid
    Kaul, Sameer
    Butt, Muheet Ahmed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 466 - 473
  • [5] Effectiveness of Deep Image Embedding Clustering Methods on Tabular Data
    Abrar, Sakib
    Sekmen, Ali
    Samad, Manar D.
    2023 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE, ICACI, 2023,
  • [6] Tabular data: Deep learning is not all you need
    Shwartz-Ziv, Ravid
    Armon, Amitai
    INFORMATION FUSION, 2022, 81 : 84 - 90
  • [7] Clustering and learning Gaussian distribution for continuous optimization
    Lu, Q
    Yao, X
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2005, 35 (02): : 195 - 204
  • [8] TableNN: Deep Learning Framework for Learning Domain Specific Tabular Data
    Sankhe, Pranav
    Khabiri, Elham
    Agrawal, Bhavna
    Li, Yingjie
    Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021, 2021, : 4097 - 4102
  • [9] TableNN: Deep Learning Framework for Learning Domain Specific Tabular Data
    Sankhe, Pranav
    Khabiri, Elham
    Agrawal, Bhavna
    Li, Yingjie
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4097 - 4102
  • [10] Deep Multirepresentation Learning for Data Clustering
    Sadeghi, Mohammadreza
    Armanfard, Narges
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15675 - 15686