Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification

被引：3

作者：

Moreo, Alejandro ^{[1
]}

Pedrotti, Andrea ^{[1
]}

Sebastiani, Fabrizio ^{[1
]}

机构：

[1] CNR, Ist Sci & Tecnol Informaz, Via Giuseppe Moruzzi 1, I-56124 Pisa, Italy

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2023年 / 41卷 / 02期

基金：

欧盟地平线“2020”;

关键词：

Transfer learning; heterogeneous transfer learning; cross-lingual text classification; ensemble learning; word embeddings; REPRESENTATION;

D O I：

10.1145/3544104

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Funnelling (FUN) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives FUN an edge over CLTC systems in which these correlations cannot be brought to bear. In this article, we describe Generalized FUNnelling (GFUN), a generalization of FUN consisting of an HTL architecture in which 1st-tier components can be arbitrary view-generating FUNctions, i.e., language-dependent FUNctions that each produce a language-independent representation ("view") of the (monolingual) document. We describe an instance of GFUN in which the meta-classifier receives as input a vector of calibrated posterior probabilities (as in FUN) aggregated to other embedded representations that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings), word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings), and word-context correlations (as encoded by multilingual BERT). We show that this instance of GFUN substantially improves over FUN and over state-of-the-art baselines by reporting experimental results obtained on two large, standard datasets for multilingual multilabel text classification. Our code that implements GFUN is publicly available.

引用

页数：37

共 50 条

[21] Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems
Gjoreski, Kristijan
Gjoreski, Aleksandar
Kraljevski, Ivan
Hirschfeld, Diane
INTERSPEECH 2019, 2019, : 1916 - 1920
[22] End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Chen, Yuan-Jui
Tu, Tao
Yeh, Cheng-chieh
Lee, Hung-yi
INTERSPEECH 2019, 2019, : 2075 - 2079
[23] Cross-Lingual Transfer Learning for Medical Named Entity Recognition
Ding, Pengjie
Wang, Lei
Liang, Yaobo
Lu, Wei
Li, Linfeng
Wang, Chun
Tang, Buzhou
Yan, Jun
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 403 - 418
[24] Class-Dependent Canonical Correlation Analysis for Scalable Cross-Lingual Document Categorization
Hady, Mohamed Farouk Abdel
Asham, Mina
2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2013, : 308 - 315
[25] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Zolzaya Byambadorj
Ryota Nishimura
Altangerel Ayush
Kengo Ohta
Norihide Kitaoka
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[26] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
Byambadorj, Zolzaya
Nishimura, Ryota
Ayush, Altangerel
Ohta, Kengo
Kitaoka, Norihide
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[27] Combining Cross-lingual and Cross-task Supervision for Zero-Shot Learning
Pikuliak, Matus
Simko, Marian
TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 162 - 170
[28] Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning
Edmiston, Daniel
Keung, Phillip
Smith, Noah A.
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 892 - 899
[29] Cross-lingual transfer learning during supervised training in low resource scenarios
Das, Amit
Hasegawa-Johnson, Mark
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3531 - 3535
[30] A Comparative Study onWord Embeddings in Deep Learning for Text Classification
Wang, Congcong
Nulty, Paul
Lillis, David
2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 37 - 46

← 1 2 3 4 5 →