Heterogeneous Document Embeddings for Cross-Lingual Text Classification

被引:4
|
作者
Moreo, Alejandro [1 ]
Pedrotti, Andrea [1 ,2 ]
Sebastiani, Fabrizio [1 ]
机构
[1] CNR, Ist Sci & Tecnol lInformaz, I-56124 Pisa, Italy
[2] Univ Pisa, Dipartimento Informat, I-56127 Pisa, Italy
来源
36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021 | 2021年
基金
欧盟地平线“2020”;
关键词
Heterogeneous Transfer Learning; Transfer Learning; Text Classification; Ensemble Learning; Word Embeddings;
D O I
10.1145/3412841.3442093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Funnelling (FUN) is a method for cross-lingual text classification (CLC) based on a two-tier ensemble for heterogeneous transfer learning. In FUN, 1st-tier classifiers, each working on a different, language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives FUN an edge over CLC systems where these correlations cannot be leveraged. We here describe Generalized Funnelling (GFUN), a learning ensemble where the metaclassifier receives as input the above vector of calibrated posterior probabilities, concatenated with document embeddings (aligned across languages) that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings) and word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings). We show that GFUN improves on FUN by describing experiments on two large, standard multilingual datasets for multi-label text classification.
引用
收藏
页码:685 / 688
页数:4
相关论文
共 50 条
  • [1] Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (02)
  • [2] Cross-Lingual Text Classification with Model Translation and Document Translation
    Moh, Teng-Sheng
    Zhang, Zhang
    PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,
  • [3] Cross-lingual Text Classification with Heterogeneous Graph Neural Network
    Wang, Ziyun
    Liu, Xuan
    Yang, Peiji
    Liu, Shixing
    Wang, Zhisheng
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 612 - 620
  • [4] Cross-lingual Short-Text Document Classification for Facebook Comments
    Faqeeh, Mosab
    Abdulla, Nawaf
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Quwaider, Muhannad
    2014 INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD), 2014, : 573 - 578
  • [5] Cross-lingual Distillation for Text Classification
    Xu, Ruochen
    Yang, Yiming
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1415 - 1425
  • [6] Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph
    Chairatanakul, Nuttapong
    Sriwatanasakdi, Noppayut
    Charoenphakdee, Nontawat
    Liu, Xin
    Murata, Tsuyoshi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1504 - 1517
  • [7] Cross-Lingual Word Embeddings
    Søgaard A.
    Vulić I.
    Ruder S.
    Faruqui M.
    Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [8] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [9] Cross-Lingual Word Embeddings
    Agirre, Eneko
    COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [10] Semantic Space Transformations for Cross-Lingual Document Classification
    Martinek, Jiri
    Lenc, Ladislav
    Kral, Pavel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 608 - 616