Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification

被引:3
|
作者
Moreo, Alejandro [1 ]
Pedrotti, Andrea [1 ]
Sebastiani, Fabrizio [1 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, Via Giuseppe Moruzzi 1, I-56124 Pisa, Italy
基金
欧盟地平线“2020”;
关键词
Transfer learning; heterogeneous transfer learning; cross-lingual text classification; ensemble learning; word embeddings; REPRESENTATION;
D O I
10.1145/3544104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Funnelling (FUN) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives FUN an edge over CLTC systems in which these correlations cannot be brought to bear. In this article, we describe Generalized FUNnelling (GFUN), a generalization of FUN consisting of an HTL architecture in which 1st-tier components can be arbitrary view-generating FUNctions, i.e., language-dependent FUNctions that each produce a language-independent representation ("view") of the (monolingual) document. We describe an instance of GFUN in which the meta-classifier receives as input a vector of calibrated posterior probabilities (as in FUN) aggregated to other embedded representations that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings), word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings), and word-context correlations (as encoded by multilingual BERT). We show that this instance of GFUN substantially improves over FUN and over state-of-the-art baselines by reporting experimental results obtained on two large, standard datasets for multilingual multilabel text classification. Our code that implements GFUN is publicly available.
引用
收藏
页数:37
相关论文
共 50 条
  • [21] Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems
    Gjoreski, Kristijan
    Gjoreski, Aleksandar
    Kraljevski, Ivan
    Hirschfeld, Diane
    INTERSPEECH 2019, 2019, : 1916 - 1920
  • [22] End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
    Chen, Yuan-Jui
    Tu, Tao
    Yeh, Cheng-chieh
    Lee, Hung-yi
    INTERSPEECH 2019, 2019, : 2075 - 2079
  • [23] Cross-Lingual Transfer Learning for Medical Named Entity Recognition
    Ding, Pengjie
    Wang, Lei
    Liang, Yaobo
    Lu, Wei
    Li, Linfeng
    Wang, Chun
    Tang, Buzhou
    Yan, Jun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 403 - 418
  • [24] Class-Dependent Canonical Correlation Analysis for Scalable Cross-Lingual Document Categorization
    Hady, Mohamed Farouk Abdel
    Asham, Mina
    2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2013, : 308 - 315
  • [25] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Zolzaya Byambadorj
    Ryota Nishimura
    Altangerel Ayush
    Kengo Ohta
    Norihide Kitaoka
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [26] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [27] Combining Cross-lingual and Cross-task Supervision for Zero-Shot Learning
    Pikuliak, Matus
    Simko, Marian
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 162 - 170
  • [28] Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning
    Edmiston, Daniel
    Keung, Phillip
    Smith, Noah A.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 892 - 899
  • [29] Cross-lingual transfer learning during supervised training in low resource scenarios
    Das, Amit
    Hasegawa-Johnson, Mark
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3531 - 3535
  • [30] A Comparative Study onWord Embeddings in Deep Learning for Text Classification
    Wang, Congcong
    Nulty, Paul
    Lillis, David
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 37 - 46