Improving neural topic modeling via Sinkhorn divergence

被引：12

作者：

Liu, Luyang ^{[1
]}

Huang, Heyan ^{[1
,2
,3
]}

Gao, Yang ^{[1
]}

Zhang, Yongfeng ^{[4
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, Southeast Acad Informat Technol, Putian 351100, Fujian, Peoples R China

[3] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China

[4] Rutgers State Univ, Dept Comp Sci, 110 Frelinghuysen Rd, Piscataway, NJ 08854 USA

来源：

INFORMATION PROCESSING & MANAGEMENT | 2022年 / 59卷 / 03期

关键词：

Deep learning; Topic model; Sinkhorn divergence; Auto-encoder;

D O I：

10.1016/j.ipm.2021.102864

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Textual data have been a major form to convey internet users' content. How to effectively and efficiently discover latent topics among them has essential theoretical and practical value. Recently, neural topic models(NTMs), especially Variational Auto-encoder-based NTMs, proved to be a successful approach for mining meaningful and interpretable topics. However, they usually suffer from two major issues:(1)Posterior collapse: KL divergence will rapidly reach zeros resulting in low-quality representation in latent distribution; (2)Unconstrained topic generative models: Topic generative models are always unconstrained, which potentially leads to discovering redundant topics. To address these issues, we propose Autoencoding Sinkhorn Topic Model based on Sinkhorn Auto-encoder(SAE) and Sinkhorn divergence. SAE utilizes Sinkhorn divergence rather than problematic KL divergence to optimize the difference between posterior and prior, which is free of posterior collapse. Then, to reduce topic redundancy, Sinkhorn Topic Diversity Regularization(STDR) is presented. STDR leverages the proposed Salient Topic Layer and Sinkhorn divergence for measuring distance between salient topic features and serves as a penalty term in loss function facilitating discovering diversified topics in training. Several experiments have been conducted on 2 popular datasets to verify our contribution. Experiment results demonstrate the effectiveness of the proposed model.

引用

页数：16

共 49 条

[1] Bianchi F, 2021, ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, P759
[2] Blei, 2006, ADV NEURAL INFORM PR, V18, P147, DOI DOI 10.1145/1143844.1143859
[3] Variational Inference: A Review for Statisticians
Blei, David M.
Kucukelbir, Alp
McAuliffe, Jon D.
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 859 - 877
[4] Blei DM, 2002, ADV NEUR IN, V14, P601
[5] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[6] Bowman SR, 2015, P 20 SIGNLL C COMPUT, DOI DOI 10.18653/V1/K16-1002
[7] Cuturi M., 2013, Advances in Neural Information Processing Systems, V26, P2292
[8] An Entropic Optimal Transport loss for learning deep neural networks under label noise in remote sensing images
Damodaran, Bharath Bhushan
Flamary, Remi
Seguy, Vivien
Courty, Nicolas
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 191
[9] Topic Modeling in Embedding Spaces
Dieng, Adji B.
Ruiz, Francisco J. R.
Blei, David M.
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 (439-453) : 439 - 453
[10] Dugas C, 2001, ADV NEUR IN, V13, P472

← 1 2 3 4 5 →