Reproducible Extraction of Cross-lingual Topics (rectr)

被引:19
|
作者
Chan, Chung-Hong [1 ]
Zeng, Jing [2 ]
Wessler, Hartmut [3 ]
Jungblut, Marc [4 ]
Welbers, Kasper [5 ]
Bajjalieh, Joseph W. [6 ]
van Atteveldt, Wouter [5 ]
Althaus, Scott L. [6 ]
机构
[1] Univ Mannheim, Mannheimer Zentrum Europa Sozialforsch, D-68131 Mannheim, Germany
[2] Univ Zurich, Dept Commun & Media Res, Zurich, Switzerland
[3] Univ Mannheim, Inst Media & Commun Studies, Mannheim, Germany
[4] LMU Munchen, Dept Media & Commun, Munich, Germany
[5] Vrije Univ Amsterdam, Dept Commun Sci, Amsterdam, Netherlands
[6] Univ Illinois, Cline Ctr Adv Social Res, Urbana, IL USA
基金
美国人文基金会;
关键词
SENTIMENT ANALYSIS; TEXT; TRANSLATION;
D O I
10.1080/19312458.2020.1812555
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method - Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.
引用
收藏
页码:285 / 305
页数:21
相关论文
共 50 条
  • [41] A Multi-Layer Network for Aspect-Based Cross-Lingual Sentiment Classification
    Sattar, Kalim
    Umer, Qasim
    Vasbieva, Dinara G.
    Chung, Sungwook
    Latif, Zohaib
    Lee, Choonhwa
    IEEE ACCESS, 2021, 9 : 133961 - 133973
  • [42] Cross-lingual phonological effects in different-script bilingual visual-word recognition
    Peleg, Orna
    Degani, Tamar
    Raziq, Muna
    Taha, Nur
    SECOND LANGUAGE RESEARCH, 2020, 36 (04) : 653 - 690
  • [43] Embedding Projection for Targeted Cross-Lingual Sentiment: Model Comparisons and a Real-World Study
    Barnes, Jeremy
    Klinger, Roman
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 66 : 691 - 742
  • [44] An unsupervised data-driven cross-lingual method for building high precision sentiment lexicons
    Sangiorgi, Pierluca
    Augello, Agnese
    Pilato, Giovanni
    2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 184 - 190
  • [45] How a Deep Contextualized Representation and Attention Mechanism Justifies Explainable Cross-Lingual Sentiment Analysis
    Ghasemi, Rouzbeh
    Momtazi, Saeedeh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (11)
  • [46] Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla-A Low Resourced Language
    at Pandit, Raj
    Sengupta, Saptarshi
    Naskar, Sudip Kumar
    Dash, Niladri Sekhar
    Sardar, Mohini Mohan
    INFORMATICS-BASEL, 2019, 6 (02):
  • [47] Improving cross-lingual low-resource speech recognition by Task-based Meta PolyLoss
    Chen, Yaqi
    Zhang, Hao
    Yang, Xukui
    Zhang, Wenlin
    Qu, Dan
    COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [48] Developing cross-lingual sentiment analysis of Malay Twitter data using lexicon-based approach
    Zabha N.I.
    Ayop Z.
    Anawar S.
    Hamid E.
    Abidin Z.Z.
    International Journal of Advanced Computer Science and Applications, 2019, 10 (01): : 346 - 351
  • [49] Developing Cross-lingual Sentiment Analysis of Malay Twitter Data Using Lexicon-based Approach
    Zabha, Nur Imanina
    Ayop, Zakiah
    Anawar, Syarulnaziah
    Hamid, Erman
    Abidin, Zaheera Zainal
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) : 346 - 351
  • [50] Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-training
    Hardalov, Momchil
    Arora, Arnav
    Nakov, Preslav
    Augenstein, Isabelle
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10729 - 10737