Cross-lingual learning for text processing: A survey

被引:33
|
作者
Pikuliak, Matus [1 ]
Simko, Marian [1 ]
Bielikova, Maria [1 ]
机构
[1] Slovak Univ Technol Bratislava, Fac Informat & Informat Technol, Ilkovicova 2, Bratislava 84216, Slovakia
关键词
Cross-lingual learning; Multilingual learning; Transfer learning; Deep learning; Machine learning; Text processing; Natural language processing;
D O I
10.1016/j.eswa.2020.113765
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many intelligent systems in business, government or academy process natural language as an input during inference or they might even communicate with users in natural language. The natural language processing is currently often done with machine learning models. However, machine learning needs training data and such data are often scarce for low-resource languages. The lack of data and resulting poor performance of natural language processing can be solved with cross-lingual learning. Cross-lingual learning is a paradigm for transferring knowledge from one natural language to another. The transfer of knowledge can help us overcome the lack of data in the target languages and create intelligent systems and machine learning models for languages, where it was not possible previously. Despite its increasing popularity and potential, no comprehensive survey on cross-lingual learning was conducted so far. We survey 173 text processing cross-lingual learning papers and examine tasks, data sets and languages that were used. The most important contribution of our work is that we identify and analyze four types of cross-lingual transfer based on "what" is being transferred. Such insight might help other NLP researchers and practitioners to understand how to use cross-lingual learning for wide range of problems. In addition, we identify what we consider to be the most important research directions that might help the community to focus their future work in cross-lingual learning. We present a comprehensive table of all the surveyed papers with various data related to the cross-lingual learning techniques they use. The table can be used to find relevant papers and compare the approaches to cross-lingual learning. To the best of our knowledge, no survey of cross-lingual text processing techniques was done in this scope before. (C) 2020 Published by Elsevier Ltd.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] A cross-lingual transfer learning method for online COVID-19-related hate speech detection
    Liu, Lin
    Xu, Duo
    Zhao, Pengfei
    Zeng, Daniel Dajun
    Hu, Paul Jen-Hwa
    Zhang, Qingpeng
    Luo, Yin
    Cao, Zhidong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [42] XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
    Abhishek, Tushar
    Sagare, Shivprasad
    Singh, Bhavyajeet
    Sharma, Anubhav
    Gupta, Manish
    Varma, Vasudeva
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 171 - 175
  • [43] Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection Using Meta Learning
    Mozafari, Marzieh
    Farahbakhsh, Reza
    Crespi, Noel
    IEEE ACCESS, 2022, 10 : 14880 - 14896
  • [44] Cross-lingual aspect-based sentiment analysis: A survey on tasks, approaches, and challenges
    Smid, Jakub
    Kral, Pavel
    INFORMATION FUSION, 2025, 120
  • [45] A comparative study of cross-lingual sentiment analysis
    Priban, Pavel
    Smid, Jakub
    Steinberger, Josef
    Mistera, Adam
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [46] Cross-lingual Adaptation Using Universal Dependencies
    Taghizadeh, Nasrin
    Faili, Heshaam
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (04)
  • [47] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 174 - 182
  • [48] Cross-lingual offensive speech identification with transfer learning for low-resource languages
    Shi, Xiayang
    Liu, Xinyi
    Xu, Chun
    Huang, Yuanyuan
    Chen, Fang
    Zhu, Shaolin
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [49] Multi-aspect multilingual and cross-lingual parliamentary speech analysis
    Miok, Kristian
    Tenorio, Encarnacion Hidalgo
    Osenova, Petya
    Benitez-Castro, Miguel-Angel
    Robnik-Sikonja, Marko
    INTELLIGENT DATA ANALYSIS, 2024, 28 (01) : 239 - 260
  • [50] Cross-Lingual Semantic Role Labeling With Model Transfer
    Fei, Hao
    Zhang, Meishan
    Li, Fei
    Ji, Donghong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2427 - 2437