Reproducible Extraction of Cross-lingual Topics (rectr)

被引:20
|
作者
Chan, Chung-Hong [1 ]
Zeng, Jing [2 ]
Wessler, Hartmut [3 ]
Jungblut, Marc [4 ]
Welbers, Kasper [5 ]
Bajjalieh, Joseph W. [6 ]
van Atteveldt, Wouter [5 ]
Althaus, Scott L. [6 ]
机构
[1] Univ Mannheim, Mannheimer Zentrum Europa Sozialforsch, D-68131 Mannheim, Germany
[2] Univ Zurich, Dept Commun & Media Res, Zurich, Switzerland
[3] Univ Mannheim, Inst Media & Commun Studies, Mannheim, Germany
[4] LMU Munchen, Dept Media & Commun, Munich, Germany
[5] Vrije Univ Amsterdam, Dept Commun Sci, Amsterdam, Netherlands
[6] Univ Illinois, Cline Ctr Adv Social Res, Urbana, IL USA
基金
美国人文基金会;
关键词
SENTIMENT ANALYSIS; TEXT; TRANSLATION;
D O I
10.1080/19312458.2020.1812555
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method - Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.
引用
收藏
页码:285 / 305
页数:21
相关论文
共 50 条
  • [22] A Case Study and Qualitative Analysis of Simple Cross-lingual Opinion Mining
    Hagerer, Gerhard
    Leung, Wing Sheung
    Liu, Qiaoxi
    Danner, Hannah
    Groh, Georg
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 17 - 26
  • [23] Zero-Shot Learning for Cross-Lingual News Sentiment Classification
    Pelicon, Andraz
    Pranjic, Marko
    Miljkovic, Dragana
    Skrlj, Blaz
    Pollak, Senja
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [24] Cross-Lingual Language Modeling for Low-Resource Speech Recognition
    Xu, Ping
    Fung, Pascale
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (06): : 1134 - 1144
  • [25] Cross-lingual semantic annotation of biomedical literature: experiments in Spanish and English
    Perez, Naiara
    Accuosto, Pablo
    Bravo, Alex
    Cuadros, Montse
    Martinez-Garcia, Eva
    Saggion, Horacio
    Rigau, German
    BIOINFORMATICS, 2020, 36 (06) : 1872 - 1880
  • [26] A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM
    Miah, Md Saef Ullah
    Kabir, Md Mohsin
    Bin Sarwar, Talha
    Safran, Mejdl
    Alfarhood, Sultan
    Mridha, M. F.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [27] A Systematic Review of Cross-Lingual Sentiment Analysis: Tasks, Strategies, and Prospects
    Zhao, Chuanjun
    Wu, Meiling
    Yang, Xinyi
    Zhang, Wenyue
    Zhang, Shaoxia
    Wang, Suge
    Li, Deyu
    ACM COMPUTING SURVEYS, 2024, 56 (07)
  • [28] Multi-aspect multilingual and cross-lingual parliamentary speech analysis
    Miok, Kristian
    Tenorio, Encarnacion Hidalgo
    Osenova, Petya
    Benitez-Castro, Miguel-Angel
    Robnik-Sikonja, Marko
    INTELLIGENT DATA ANALYSIS, 2024, 28 (01) : 239 - 260
  • [29] Leveraging ChatGPT for Enhancing Arabic NLP: Application for Semantic Role Labeling and Cross-Lingual Annotation Projection
    Senator, Ferial
    Lakhfif, Abdelaziz
    Zenbout, Imene
    Boutouta, Hanane
    Mediani, Chahrazed
    IEEE ACCESS, 2025, 13 : 3707 - 3725
  • [30] Measuring Catastrophic Forgetting in Cross-Lingual Classification: Transfer Paradigms and Tuning Strategies
    Koloski, Boshko
    Skrlj, Blaz
    Robnik-Sikonja, Marko
    Pollak, Senja
    IEEE ACCESS, 2025, 13 : 33509 - 33520