Reproducible Extraction of Cross-lingual Topics (rectr)

被引:19
|
作者
Chan, Chung-Hong [1 ]
Zeng, Jing [2 ]
Wessler, Hartmut [3 ]
Jungblut, Marc [4 ]
Welbers, Kasper [5 ]
Bajjalieh, Joseph W. [6 ]
van Atteveldt, Wouter [5 ]
Althaus, Scott L. [6 ]
机构
[1] Univ Mannheim, Mannheimer Zentrum Europa Sozialforsch, D-68131 Mannheim, Germany
[2] Univ Zurich, Dept Commun & Media Res, Zurich, Switzerland
[3] Univ Mannheim, Inst Media & Commun Studies, Mannheim, Germany
[4] LMU Munchen, Dept Media & Commun, Munich, Germany
[5] Vrije Univ Amsterdam, Dept Commun Sci, Amsterdam, Netherlands
[6] Univ Illinois, Cline Ctr Adv Social Res, Urbana, IL USA
基金
美国人文基金会;
关键词
SENTIMENT ANALYSIS; TEXT; TRANSLATION;
D O I
10.1080/19312458.2020.1812555
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method - Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.
引用
收藏
页码:285 / 305
页数:21
相关论文
共 50 条
  • [1] Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection
    Ben Abacha, Asma
    Zweigenbaum, Pierre
    Max, Aurelien
    2013 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2013), 2013, : 82 - 88
  • [2] Cross-Lingual Sentiment Analysis: A Survey
    Xu Y.
    Cao H.
    Wang W.
    Du W.
    Xu C.
    Data Analysis and Knowledge Discovery, 2023, 7 (01) : 1 - 21
  • [3] Cross-lingual sense determination:: Can it work?
    Ide, N
    COMPUTERS AND THE HUMANITIES, 2000, 34 (1-2): : 223 - 234
  • [4] A comparative study of cross-lingual sentiment analysis
    Priban, Pavel
    Smid, Jakub
    Steinberger, Josef
    Mistera, Adam
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [5] English-Welsh Cross-Lingual Embeddings
    Espinosa-Anke, Luis
    Palmer, Geraint
    Corcoran, Padraig
    Filimonov, Maxim
    Spasic, Irena
    Knight, Dawn
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [6] A Comparative Study of Cross-Lingual Sentiment Classification
    Wan, Xiaojun
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 24 - 31
  • [7] An Approach to Cross-lingual Sentiment Lexicon Construction
    Chang, Chia-Hsuan
    Wu, Ming-Lun
    Hwang, San-Yih
    2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 129 - 131
  • [8] Linear Transformations for Cross-lingual Sentiment Analysis
    Priban, Pavel
    Smid, Jakub
    Mistera, Adam
    Kral, Pavel
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 125 - 137
  • [9] Cross-Lingual Sense Determination: Can It Work?
    Nancy Ide
    Computers and the Humanities, 2000, 34 : 223 - 234
  • [10] Cross-Lingual Classification of Political Texts Using Multilingual Sentence Embeddings
    Licht, Hauke
    POLITICAL ANALYSIS, 2023, 31 (03) : 366 - 379