Heterogeneous Document Embeddings for Cross-Lingual Text Classification

被引:4
作者
Moreo, Alejandro [1 ]
Pedrotti, Andrea [1 ,2 ]
Sebastiani, Fabrizio [1 ]
机构
[1] CNR, Ist Sci & Tecnol lInformaz, I-56124 Pisa, Italy
[2] Univ Pisa, Dipartimento Informat, I-56127 Pisa, Italy
来源
36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021 | 2021年
基金
欧盟地平线“2020”;
关键词
Heterogeneous Transfer Learning; Transfer Learning; Text Classification; Ensemble Learning; Word Embeddings;
D O I
10.1145/3412841.3442093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Funnelling (FUN) is a method for cross-lingual text classification (CLC) based on a two-tier ensemble for heterogeneous transfer learning. In FUN, 1st-tier classifiers, each working on a different, language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives FUN an edge over CLC systems where these correlations cannot be leveraged. We here describe Generalized Funnelling (GFUN), a learning ensemble where the metaclassifier receives as input the above vector of calibrated posterior probabilities, concatenated with document embeddings (aligned across languages) that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings) and word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings). We show that GFUN improves on FUN by describing experiments on two large, standard multilingual datasets for multi-label text classification.
引用
收藏
页码:685 / 688
页数:4
相关论文
共 9 条
[1]  
[Anonymous], 2002, IN P INT C NEURAL
[2]  
Conneau A., 2018, 6 INT C LEARNING REP
[3]   A survey on heterogeneous transfer learning [J].
Day O. ;
Khoshgoftaar T.M. .
Journal of Big Data, 2017, 4 (01)
[4]   Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and Its Application to Cross-Lingual Text Classification [J].
Esuli, Andrea ;
Moreo, Alejandro ;
Sebastiani, Fabrizio .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (03)
[5]   Lightweight Random Indexing for Polylingual Text Classification [J].
Fernandez, Alejandro Moreo ;
Esuli, Andrea ;
Sebastiani, Fabrizio .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 57 :151-185
[6]   Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification [J].
Fernandez, Alejandro Moreo ;
Esuli, Andrea ;
Sebastiani, Fabrizio .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 55 :131-163
[7]  
Moreo A, 2019, Arxiv, DOI arXiv:1911.11506
[8]   Exploiting Wikipedia for cross-lingual and multilingual information retrieval [J].
Sorg, P. ;
Cimiano, P. .
DATA & KNOWLEDGE ENGINEERING, 2012, 74 :26-45
[9]  
Vilalta R., 2011, Encyclopedia of Machine Learning, P545, DOI DOI 10.1007/978-0-387-30164-8401