Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data

被引:12
|
作者
Leyli-Abadi, Milad [1 ]
Labiod, Lazhar [1 ]
Nadif, Mohamed [1 ]
机构
[1] Paris Descartes Univ, LIPADE, F-75006 Paris, France
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II | 2017年 / 10235卷
关键词
Auto-encoder; Deep learning; Cosine similarity; Neighborhood; Document clustering; Unsupervised learning; Dimensionality reduction; FRAMEWORK;
D O I
10.1007/978-3-319-57529-2_62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods are widely used in vision and face recognition, however there is a real lack of application of such methods in the field of text data. In this context, the data is often represented by a sparse high dimensional document-term matrix. Dealing with such data matrices, we present, in this paper, a new denoising auto-encoder for dimensionality reduction, where each document is not only affected by its own information, but also affected by the information from its neighbors according to the cosine similarity measure. It turns out that the proposed auto-encoder can discover the low dimensional embeddings, and as a result reveal the underlying effective manifold structure. The visual representation of these embeddings suggests the suitability of performing the clustering on the set of documents relying on the Expectation-Maximization algorithm for Gaussian mixture models. On real-world datasets, the relevance of the presented auto-encoder in the visualisation and document clustering field is shown by a comparison with five widely used unsupervised dimensionality reduction methods including the classic auto-encoder.
引用
收藏
页码:801 / 813
页数:13
相关论文
共 50 条
  • [1] An Autoencoder-Based Dimensionality Reduction Algorithm for Intelligent Clustering of Mineral Deposit Data
    Li, Yan
    Luo, Xiong
    Chen, Maojian
    Zhu, Yueqin
    Gao, Yang
    PROCEEDINGS OF 2019 CHINESE INTELLIGENT AUTOMATION CONFERENCE, 2020, 586 : 408 - 415
  • [2] Dimensionality Reduction and Anomaly Detection for CPPS Data using Autoencoder
    Eiteneuer, Benedikt
    Hranisavljevic, Nemanja
    Niggemann, Oliver
    2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2019, : 1286 - 1292
  • [3] Denoising and dimensionality reduction of genomic data
    Capobianco, E
    FLUCTUATIONS AND NOISE IN BIOLOGICAL, BIOPHYSICAL, AND BIOMEDICAL SYSTEMS III, 2005, 5841 : 69 - 80
  • [4] Dimensionality Reduction of Single-Cell RNA Sequencing Data by Combining Entropy and Denoising AutoEncoder
    Zhu, Xiaoshu
    Li, Jian
    Lin, Yongchang
    Zhao, Liquan
    Wang, Jianxin
    Peng, Xiaoqing
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (10) : 1074 - 1084
  • [5] Classification model of electricity consumption behavior based on sparse denoising autoencoder feature dimensionality reduction and spectral clustering
    Huang, Yifan
    Yao, Zhengnan
    Xu, Qifeng
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2024, 158
  • [6] A restorable autoencoder as a method for dimensionality reduction
    Jeong, Yeongcheol
    Kim, Sunhee
    Lee, Chang-Yong
    JOURNAL OF THE KOREAN PHYSICAL SOCIETY, 2021, 78 (04) : 315 - 327
  • [7] A restorable autoencoder as a method for dimensionality reduction
    Yeongcheol Jeong
    Sunhee Kim
    Chang-Yong Lee
    Journal of the Korean Physical Society, 2021, 78 : 315 - 327
  • [8] Dimensionality Reduction for Clustering and Cluster Tracking of Cytometry Data
    Putri, Givanna H.
    Read, Mark N.
    Koprinska, Irena
    Ashhurst, Thomas M.
    King, Nicholas J. C.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 624 - 640
  • [9] Using an Autoencoder for Dimensionality Reduction in Quantum Dynamics
    Reiter, Sebastian
    Schnappinger, Thomas
    de Vivie-Riedle, Regina
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 783 - 787
  • [10] Soft dimensionality reduction for reinforcement data clustering
    Fatemeh Fathinezhad
    Peyman Adibi
    Bijan Shoushtarian
    Hamidreza Baradaran Kashani
    Jocelyn Chanussot
    World Wide Web, 2023, 26 : 3027 - 3054