InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

被引:0
|
作者
Wu, Xiaobao [1 ]
Dong, Xinshuai [2 ]
Nguyen, Thong [3 ]
Liu, Chaoqun [1 ,4 ]
Pan, Liang-Ming [3 ]
Luu, Anh Tuan [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] Natl Univ Singapore, Singapore, Singapore
[4] DAMO Acad, Alibaba Grp, Singapore, Singapore
来源
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. How-ever, most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries. In this paper, we pro-pose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM). Instead of the direct alignment in previous work, we propose a topic alignment with mutual information method. This works as a regularization to properly align topics and prevent degenerate topic representations of words, which mitigates the repetitive topic issue. To address the low-coverage dictionary issue, we further propose a cross-lingual vocabulary linking method that finds more linked cross-lingual words for topic alignment beyond the translations of a given dictionary. Extensive experiments on English, Chinese, and Japanese datasets demonstrate that our method outperforms state-of-the-art baselines, producing more coherent, diverse, and well-aligned topics and showing better transferability for cross-lingual classification tasks.
引用
收藏
页码:13763 / 13771
页数:9
相关论文
共 50 条
  • [21] Dictionary methods for cross-lingual information retrieval
    Ballesteros, L
    Croft, B
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, 1996, 1134 : 791 - 801
  • [22] A system for supporting cross-lingual information retrieval
    Capstick, J
    Diagne, AK
    Erbach, G
    Uszkoreit, H
    Leisenberg, A
    Leisenberg, M
    INFORMATION PROCESSING & MANAGEMENT, 2000, 36 (02) : 275 - 289
  • [23] Cross-Lingual Information to the Rescue in Keyword Extraction
    Huang, Chung-Chi
    Eskenazi, Maxine
    Carbonell, Jaime
    Ku, Lun-Wei
    Yang, Ping-Che
    PROCEEDINGS OF 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS, 2014, : 1 - 6
  • [24] Cross-Lingual Sentence Extraction for Information Distillation
    Singla, Adish Kumar
    Hakkani-Tuer, Dilek
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2707 - 2710
  • [25] Detecting Cross-Lingual Information Gaps in Wikipedia
    Ashrafmoghari, Vahid
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 581 - 585
  • [26] Multilingual modeling of cross-lingual spelling variants
    Linden, Krister
    INFORMATION RETRIEVAL, 2006, 9 (03): : 295 - 310
  • [27] Multilingual modeling of cross-lingual spelling variants
    Krister Lindén
    Information Retrieval, 2006, 9 : 295 - 310
  • [28] Enhancing Graph Variational Autoencoder for Short Text Topic Modeling with Mutual Information Maximization
    Ge, Yuhang
    Hu, Xuegang
    2022 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG), 2022, : 64 - 70
  • [29] Enhancing Cross-Lingual Topic-Essay Generation with Knowledge and Topic Consistency Constraints
    Gu, Huailing
    Huang, Yuxin
    Yu, Zhengtao
    Mao, Cunli
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 277 - 289
  • [30] Cross-lingual Link Prediction Using Multimodal Relational Topic Models
    Sakata, Yosuke
    Eguchi, Koji
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 951 - 958