Word-class embeddings for multiclass text classification

被引:22
|
作者
Moreo, Alejandro [1 ]
Esuli, Andrea [1 ]
Sebastiani, Fabrizio [1 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, I-56124 Pisa, Italy
基金
欧盟地平线“2020”;
关键词
Word-class embeddings; Word embeddings; Distributional hypothesis; Multiclass text classification; Neural text classification; REPRESENTATIONS;
D O I
10.1007/s10618-020-00735-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using six popular neural architectures and six widely used and publicly available datasets for multiclass text classification. One further advantage of this method is that it is conceptually simple and straightforward to implement. Our code that implements WCEs is publicly available at https:// github.com/AlexMoreo/word-class-embeddings.
引用
收藏
页码:911 / 963
页数:53
相关论文
共 50 条
  • [1] Word-class embeddings for multiclass text classification
    Alejandro Moreo
    Andrea Esuli
    Fabrizio Sebastiani
    Data Mining and Knowledge Discovery, 2021, 35 : 911 - 963
  • [2] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [3] Text classification with semantically enriched word embeddings
    Pittaras, N.
    Giannakopoulos, G.
    Papadakis, G.
    Karkaletsis, V
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (04) : 391 - 425
  • [4] An analysis of hierarchical text classification using word embeddings
    Stein, Roger Alan
    Jaques, Patricia A.
    Valiati, Joao Francisco
    INFORMATION SCIENCES, 2019, 471 : 216 - 232
  • [5] WORD-CLASS TRANSFERS IN POETRY AND PROSE
    FONAGY, I
    LANGUAGE AND STYLE, 1982, 15 (04): : 227 - 240
  • [6] Arabic Text Classification Based on Word and Document Embeddings
    El Mahdaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 32 - 41
  • [7] Joint Multiclass Debiasing of Word Embeddings
    Popovic, Radomir
    Lemmerich, Florian
    Strohmaier, Markus
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 79 - 89
  • [8] Investigating Word-Class Distributions in Word Vector Spaces
    Sasano, Ryohei
    Korhonen, Anna
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3657 - 3666
  • [9] Task-Optimized Word Embeddings for Text Classification Representations
    Gupta, Sukrat
    Kanchinadam, Teja
    Conathan, Devin
    Fung, Glenn
    FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2020, 5
  • [10] Using Word Embeddings with Linear Models for Short Text Classification
    Krzywicki, Alfred
    Heap, Bradford
    Bain, Michael
    Wobcke, Wayne
    Schmeidl, Susanne
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 819 - 827