Text classification using embeddings: a survey

被引:7
作者
da Costa, Liliane Soares [1 ]
Oliveira, Italo L. [1 ]
Fileto, Renato [1 ]
机构
[1] Fed Univ Santa Catarina UFSC, Dept Informat & Stat INE, Campus Reitor Joao David Ferreira Lima, BR-88040900 Florianopolis, SC, Brazil
关键词
Text classification; Feature representation; Embeddings; LABEL; DOCUMENT;
D O I
10.1007/s10115-023-01856-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification results can be hindered when just the bag-of-words model is used for representing features, because it ignores word order and senses, which can vary with the context. Embeddings have recently emerged as a means to circumvent these limitations, allowing considerable performance gains. However, determining the best combinations of classification techniques and embeddings for classifying particular corpora can be challenging. This survey provides a comprehensive review of text classification approaches that employ embeddings. First, it analyzes past and recent advancements in feature representation for text classification. Then, it identifies the combinations of embedding-based feature representations and classification techniques that have provided the best performances for classifying text from distinct corpora, also providing links to the original articles, source code (when available) and data sets used in the performance evaluation. Finally, it discusses current challenges and promising directions for text classification research, such as cost-effectiveness, multi-label classification, and the potential of knowledge graphs and knowledge embeddings to enhance text classification.
引用
收藏
页码:2761 / 2803
页数:43
相关论文
共 101 条
  • [1] Text Classification Using Machine Learning Methods-A Survey
    Agarwal, Basant
    Mittal, Namita
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 : 701 - 709
  • [2] Aggarwal A., 2018, INT J ENG TECHNOLOGY, V7, P11, DOI [10.14419/ijet.v7i3.8.15210, DOI 10.14419/IJET.V7I3.8.15210]
  • [3] Aggarwal C. C., 2012, A survey of text classification algorithms in mining text data, P77, DOI [DOI 10.1007/978-1-4614-3223-4_6, 10.1007/978-1-4614-3223-44, 10.1007/978-1-4614-3223-4]
  • [4] Detecting opinion spams and fake news using text classification
    Ahmed, Hadeer
    Traore, Issa
    Saad, Sherif
    [J]. SECURITY AND PRIVACY, 2018, 1 (01):
  • [5] A Micro-Word based Approach for Arabic Sentiment Analysis
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    [J]. 2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 910 - 914
  • [6] Almeida F, 2019, arXiv
  • [7] Semantic text classification: A survey of past and recent advances
    Altinel, Berna
    Ganiz, Murat Can
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1129 - 1153
  • [8] Aly R, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, P323
  • [9] A Rule-Based Approach to Embedding Techniques for Text Document Classification
    Aubaid, Asmaa M.
    Mishra, Alok
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [10] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]