Text classification using embeddings: a survey

被引:7
|
作者
da Costa, Liliane Soares [1 ]
Oliveira, Italo L. [1 ]
Fileto, Renato [1 ]
机构
[1] Fed Univ Santa Catarina UFSC, Dept Informat & Stat INE, Campus Reitor Joao David Ferreira Lima, BR-88040900 Florianopolis, SC, Brazil
关键词
Text classification; Feature representation; Embeddings; LABEL; DOCUMENT;
D O I
10.1007/s10115-023-01856-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification results can be hindered when just the bag-of-words model is used for representing features, because it ignores word order and senses, which can vary with the context. Embeddings have recently emerged as a means to circumvent these limitations, allowing considerable performance gains. However, determining the best combinations of classification techniques and embeddings for classifying particular corpora can be challenging. This survey provides a comprehensive review of text classification approaches that employ embeddings. First, it analyzes past and recent advancements in feature representation for text classification. Then, it identifies the combinations of embedding-based feature representations and classification techniques that have provided the best performances for classifying text from distinct corpora, also providing links to the original articles, source code (when available) and data sets used in the performance evaluation. Finally, it discusses current challenges and promising directions for text classification research, such as cost-effectiveness, multi-label classification, and the potential of knowledge graphs and knowledge embeddings to enhance text classification.
引用
收藏
页码:2761 / 2803
页数:43
相关论文
共 50 条
  • [1] Text classification using embeddings: a survey
    Liliane Soares da Costa
    Italo L. Oliveira
    Renato Fileto
    Knowledge and Information Systems, 2023, 65 : 2761 - 2803
  • [2] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [3] An analysis of hierarchical text classification using word embeddings
    Stein, Roger Alan
    Jaques, Patricia A.
    Valiati, Joao Francisco
    INFORMATION SCIENCES, 2019, 471 : 216 - 232
  • [4] Text classification with document embeddings
    Huang, Chaochao (chaochaohuang12@fudan.edu.cn), 1600, Springer Verlag (8801):
  • [5] Text Classification with Document Embeddings
    Huang, Chaochao
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 131 - 140
  • [6] Using Word Embeddings with Linear Models for Short Text Classification
    Krzywicki, Alfred
    Heap, Bradford
    Bain, Michael
    Wobcke, Wayne
    Schmeidl, Susanne
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 819 - 827
  • [7] Text classification by untrained sentence embeddings
    Di Sarli, Daniele
    Gallicchio, Claudio
    Micheli, Alessio
    INTELLIGENZA ARTIFICIALE, 2020, 14 (02) : 245 - 259
  • [8] HWE: HybridWord Embeddings For Text Classification
    Song, Xuebo
    Srimani, Pradip K.
    Wang, James Z.
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 25 - 29
  • [9] A survey of word embeddings for clinical text
    Khattak F.K.
    Jeblee S.
    Pou-Prom C.
    Abdalla M.
    Meaney C.
    Rudzicz F.
    Journal of Biomedical Informatics: X, 2019, 4
  • [10] Development of a Text Classification Framework using Transformer-based Embeddings
    Yeasmin, Sumona
    Afrin, Nazia
    Saif, Kashfia
    Huq, Mohammad Rezwanul
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, TECHNOLOGY AND APPLICATIONS (DATA), 2022, : 74 - 82