An analysis of hierarchical text classification using word embeddings

被引:130
|
作者
Stein, Roger Alan [1 ]
Jaques, Patricia A. [1 ]
Valiati, Joao Francisco [2 ]
机构
[1] Univ Vale Rio Sinos UNISINOS, Programa Posgrad Comp Aplicada PPGCA, Av Unisinos 950, Sao Leopoldo, RS, Brazil
[2] AIE, Rua Vieira Castro 262, Porto Alegre, RS, Brazil
关键词
Hierarchical text classification; Word embeddings; Gradient tree boosting; fastText; Support vector machines;
D O I
10.1016/j.ins.2018.09.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient distributed numerical word representation models (word embeddings) combinec with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This stud investigates the application of those models and algorithms on this specific problem b3 means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations-fastText, XGBoost, SVM, and Keras' CNN-and noticeable word embeddings generation methods-GloVe, word2vec, and fastTextwith publicly available data and evaluated them with measures specifically appropriate fot the hierarchical context. FastText achieved an LcAF(1) of 0.893 on a single-labeled version o the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is very promising approach for HTC. (C) 2018 Elsevier Inc. All rights reserved
引用
收藏
页码:216 / 232
页数:17
相关论文
共 50 条
  • [41] Learning from Few Samples: Lexical Substitution with Word Embeddings for Short Text Classification
    Elekes, Abel
    Di Stefano, Antonino Simone
    Schaeler, Martin
    Boehm, Klemens
    Keller, Matthias
    2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 111 - 119
  • [42] Centroid-Means-Embedding: An Approach to Infusing Word Embeddings into Features for Text Classification
    Sohrab, Mohammad Golam
    Miwa, Makoto
    Sasaki, Yutaka
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART I, 2015, 9077 : 289 - 300
  • [43] Towards Useful Word Embeddings Evaluation on Information Retrieval, Text Classification, and Language Modeling
    Novotny, Vit
    Stefanik, Michal
    Luptak, David
    Sojka, Petr
    RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2020), 2020, : 37 - 46
  • [44] MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS
    Ostapiuk, Z., V
    Korotyeyeva, T. O.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2020, (04) : 95 - 105
  • [45] Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention
    Zeng, Xiangkai
    Yang, Cheng
    Tu, Cunchao
    Liu, Zhiyuan
    Sun, Maosong
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5650 - 5657
  • [46] Clinical Narrative Classification using Discriminant Word Embeddings with ELM
    Lauren, Paula
    Qu, Guangzhi
    Zhang, Feng
    Lendasse, Amaury
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2931 - 2938
  • [47] A Large-scale Text Analysis with Word Embeddings and Topic Modeling
    Choi, Won-Joon
    Kim, Euhee
    JOURNAL OF COGNITIVE SCIENCE, 2019, 20 (01) : 147 - 187
  • [48] Enhancing Sensitivity Classification with Semantic Features Using Word Embeddings
    McDonald, Graham
    Macdonald, Craig
    Ounis, Iadh
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 450 - 463
  • [49] UTILIZING CONTEXTUALIZED WORD EMBEDDINGS FOR TEXT MATCHING
    Yu, Hao
    Chen, Xiaoyang
    Zhou, Ying
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION (ICWAPR), 2020, : 54 - 59
  • [50] Hierarchical Classification in Text Mining for Sentiment Analysis
    Li, Jinyan
    Fong, Simon
    Zhuang, Yan
    Khoury, Richard
    2014 INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE ISCMI 2014, 2014, : 46 - 51