Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017 | 2017年 / 10318卷
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [21] MULTI-LABEL TEXT CLASSIFICATION WITH A ROBUST LABEL DEPENDENT REPRESENTATION
    Alfaro, Rodrigo
    Allende, Hector
    2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 211 - 214
  • [22] Latent Semantic Indexing and Convolutional Neural Network for Multi-Label and Multi-Class Text Classification
    Quispe, Oscar
    Ocsa, Alexander
    Coronado, Ricardo
    2017 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2017,
  • [23] Dual dimensionality reduction on instance-level and feature-level for multi-label data
    Li, Haikun
    Fang, Min
    Wang, Peng
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (35) : 24773 - 24782
  • [24] Dual dimensionality reduction on instance-level and feature-level for multi-label data
    Haikun Li
    Min Fang
    Peng Wang
    Neural Computing and Applications, 2023, 35 : 24773 - 24782
  • [25] Dimensionality Reduction Using Convolutional Autoencoders
    Mittal, Shweta
    Sangwan, Om Prakash
    ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY AND COMPUTING, AICTC 2021, 2022, 392 : 507 - 516
  • [26] Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms
    Bromuri, Stefano
    Zufferey, Damien
    Hennebert, Jean
    Schumacher, Michael
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 51 : 165 - 175
  • [27] A novel reasoning mechanism for multi-label text classification
    Wang, Ran
    Ridley, Robert
    Su, Xi'ao
    Qu, Weiguang
    Dai, Xinyu
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (02)
  • [28] Multi-label legal text classification with BiLSTM and attention
    Enamoto, Liriam
    Santos, Andre R. A. S.
    Maia, Ricardo
    Weigang, Li
    Rocha Filho, Geraldo P.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2022, 68 (04) : 369 - 378
  • [29] Multi-label Classification of Cybersecurity Text with Distant Supervision
    Ishii, Masahiro
    Mori, Kento
    Kuwana, Ryoichi
    Matsuura, Satoshi
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, ARES 2022, 2022,
  • [30] Effective Multi-Label Active Learning for Text Classification
    Yang, Bishan
    Sun, Jian-Tao
    Wang, Tengjiao
    Chen, Zheng
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 917 - 925