Word-class embeddings for multiclass text classification

被引:22
|
作者
Moreo, Alejandro [1 ]
Esuli, Andrea [1 ]
Sebastiani, Fabrizio [1 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, I-56124 Pisa, Italy
基金
欧盟地平线“2020”;
关键词
Word-class embeddings; Word embeddings; Distributional hypothesis; Multiclass text classification; Neural text classification; REPRESENTATIONS;
D O I
10.1007/s10618-020-00735-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using six popular neural architectures and six widely used and publicly available datasets for multiclass text classification. One further advantage of this method is that it is conceptually simple and straightforward to implement. Our code that implements WCEs is publicly available at https:// github.com/AlexMoreo/word-class-embeddings.
引用
收藏
页码:911 / 963
页数:53
相关论文
共 50 条
  • [21] Deep text classification of Instagram data using word embeddings and weak supervision
    Hammar, Kim
    Jaradat, Shatha
    Dokoohaki, Nima
    Matskin, Mihhail
    WEB INTELLIGENCE, 2020, 18 (01) : 53 - 67
  • [22] Emotion Detection from Text via Ensemble Classification Using Word Embeddings
    Herzig, Jonathan
    Shmueli-Scheuer, Michal
    Konopnicki, David
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 269 - 272
  • [23] From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings
    Butnaru, Andrei M.
    Ionescu, Radu Tudor
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1783 - 1792
  • [24] Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings
    Yang, Yi
    Wang, Hongan
    Zhu, Jiaqi
    Wu, Yunkun
    Jiang, Kailong
    Guo, Wenli
    Shi, Wandong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3969 - 3975
  • [25] Combining Dual Word Embeddings with Open Directory Project based Text Classification
    Aliyeva, Dinara
    Kim, Kang-Min
    Choi, Byung-Ju
    Lee, SangKeun
    PROCEEDINGS OF 2018 IEEE 17TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2018), 2018, : 179 - 186
  • [26] Oil firms accused of "word-class accounting scandal"
    不详
    TCE, 2002, (734): : 6 - 6
  • [27] A survey of word embeddings for clinical text
    Khattak F.K.
    Jeblee S.
    Pou-Prom C.
    Abdalla M.
    Meaney C.
    Rudzicz F.
    Journal of Biomedical Informatics: X, 2019, 4
  • [28] Effect of Text Color on Word Embeddings
    Ikoma, Masaya
    Iwana, Brian Kenji
    Uchida, Seiichi
    DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 : 341 - 355
  • [29] Text classification with document embeddings
    Huang, Chaochao (chaochaohuang12@fudan.edu.cn), 1600, Springer Verlag (8801):
  • [30] Text Classification with Document Embeddings
    Huang, Chaochao
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 131 - 140