InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

被引:0
|
作者
Wu, Xiaobao [1 ]
Dong, Xinshuai [2 ]
Nguyen, Thong [3 ]
Liu, Chaoqun [1 ,4 ]
Pan, Liang-Ming [3 ]
Luu, Anh Tuan [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] Natl Univ Singapore, Singapore, Singapore
[4] DAMO Acad, Alibaba Grp, Singapore, Singapore
来源
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. How-ever, most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries. In this paper, we pro-pose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM). Instead of the direct alignment in previous work, we propose a topic alignment with mutual information method. This works as a regularization to properly align topics and prevent degenerate topic representations of words, which mitigates the repetitive topic issue. To address the low-coverage dictionary issue, we further propose a cross-lingual vocabulary linking method that finds more linked cross-lingual words for topic alignment beyond the translations of a given dictionary. Extensive experiments on English, Chinese, and Japanese datasets demonstrate that our method outperforms state-of-the-art baselines, producing more coherent, diverse, and well-aligned topics and showing better transferability for cross-lingual classification tasks.
引用
收藏
页码:13763 / 13771
页数:9
相关论文
共 50 条
  • [31] Neural topic-enhanced cross-lingual word embeddings for CLIR
    Zhou, Dong
    Qu, Wei
    Li, Lin
    Tang, Mingdong
    Yang, Aimin
    INFORMATION SCIENCES, 2022, 608 : 809 - 824
  • [32] Comparative analysis of book tags: a cross-lingual perspective
    Lu, Chao
    Zhang, Chengzhi
    He, Daqing
    ELECTRONIC LIBRARY, 2016, 34 (04): : 666 - 682
  • [33] Cross-lingual Contextualized Topic Models with Zero-shot Learning
    Bianchi, Federico
    Terragni, Silvia
    Hovy, Dirk
    Nozza, Debora
    Fersini, Elisabetta
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1676 - 1683
  • [34] DISENTANGLED SPEAKER AND LANGUAGE REPRESENTATIONS USING MUTUAL INFORMATION MINIMIZATION AND DOMAIN ADAPTATION FOR CROSS-LINGUAL TTS
    Xin, Detai
    Komatsu, Tatsuya
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6608 - 6612
  • [35] CrossMath: Towards Cross-lingual Math Information Retrieval
    Gore, James
    Polletta, Joseph
    Mansouri, Behrooz
    PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 101 - 105
  • [36] A method of cross-lingual consumer health information retrieval
    Neveol, Aurelie
    Pereira, Suzanne
    Soualmia, Lina F.
    Thirion, Benoit
    Darmoni, Stefan J.
    UBIQUITY: TECHNOLOGIES FOR BETTER HEALTH IN AGING SOCIETIES, 2006, 124 : 601 - 608
  • [37] Cross-Lingual Information Retrieval System for Indian Languages
    Jagarlamudi, Jagadeesh
    Kumaran, A.
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 80 - 87
  • [38] CrossOIE: Cross-Lingual Classifier for Open Information Extraction
    Cabral, Bruno Souza
    Glauber, Rafael
    Souza, Marlo
    Claro, Daniela Barreiro
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 368 - 378
  • [39] Exploiting Wikipedia for cross-lingual and multilingual information retrieval
    Sorg, P.
    Cimiano, P.
    DATA & KNOWLEDGE ENGINEERING, 2012, 74 : 26 - 45
  • [40] Evaluating and Modeling Attribution for Cross-Lingual Question Answering
    Muller, Benjamin
    Wieting, John
    Clark, Jonathan H.
    Kwiatkowski, Tom
    Ruder, Sebastian
    Soares, Livio Baldini
    Aharoni, Roee
    Herzig, Jonathan
    Wang, Xinyi
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 144 - 157