An analysis of hierarchical text classification using word embeddings

被引:130
|
作者
Stein, Roger Alan [1 ]
Jaques, Patricia A. [1 ]
Valiati, Joao Francisco [2 ]
机构
[1] Univ Vale Rio Sinos UNISINOS, Programa Posgrad Comp Aplicada PPGCA, Av Unisinos 950, Sao Leopoldo, RS, Brazil
[2] AIE, Rua Vieira Castro 262, Porto Alegre, RS, Brazil
关键词
Hierarchical text classification; Word embeddings; Gradient tree boosting; fastText; Support vector machines;
D O I
10.1016/j.ins.2018.09.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient distributed numerical word representation models (word embeddings) combinec with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This stud investigates the application of those models and algorithms on this specific problem b3 means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations-fastText, XGBoost, SVM, and Keras' CNN-and noticeable word embeddings generation methods-GloVe, word2vec, and fastTextwith publicly available data and evaluated them with measures specifically appropriate fot the hierarchical context. FastText achieved an LcAF(1) of 0.893 on a single-labeled version o the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is very promising approach for HTC. (C) 2018 Elsevier Inc. All rights reserved
引用
收藏
页码:216 / 232
页数:17
相关论文
共 50 条
  • [1] Text Classification Using Word Embeddings
    Helaskar, Mukund N.
    Sonawane, Sheetal S.
    2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2019,
  • [2] Using Word Embeddings with Linear Models for Short Text Classification
    Krzywicki, Alfred
    Heap, Bradford
    Bain, Michael
    Wobcke, Wayne
    Schmeidl, Susanne
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 819 - 827
  • [3] Text classification with semantically enriched word embeddings
    Pittaras, N.
    Giannakopoulos, G.
    Papadakis, G.
    Karkaletsis, V
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (04) : 391 - 425
  • [4] Word-class embeddings for multiclass text classification
    Moreo, Alejandro
    Esuli, Andrea
    Sebastiani, Fabrizio
    DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (03) : 911 - 963
  • [5] Word-class embeddings for multiclass text classification
    Alejandro Moreo
    Andrea Esuli
    Fabrizio Sebastiani
    Data Mining and Knowledge Discovery, 2021, 35 : 911 - 963
  • [6] Arabic Text Classification Based on Word and Document Embeddings
    El Mahdaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 32 - 41
  • [7] Deep text classification of Instagram data using word embeddings and weak supervision
    Hammar, Kim
    Jaradat, Shatha
    Dokoohaki, Nima
    Matskin, Mihhail
    WEB INTELLIGENCE, 2020, 18 (01) : 53 - 67
  • [8] Emotion Detection from Text via Ensemble Classification Using Word Embeddings
    Herzig, Jonathan
    Shmueli-Scheuer, Michal
    Konopnicki, David
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 269 - 272
  • [9] Automatic Text Summarization using Word Embeddings
    Easwar, Arjun
    Uthra, Annie
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 1065 - 1079
  • [10] Text classification using embeddings: a survey
    Liliane Soares da Costa
    Italo L. Oliveira
    Renato Fileto
    Knowledge and Information Systems, 2023, 65 : 2761 - 2803