Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引：9

作者：

Alkhatib, Wael ^{[1
]}

Rensing, Christoph ^{[1
]}

Silberbauer, Johannes ^{[1
]}

机构：

[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany

来源：

LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017 | 2017年 / 10318卷

关键词：

Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;

D O I：

10.1007/978-3-319-59888-8_32

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.

引用

页码：380 / 394

页数：15

共 50 条

[21] MULTI-LABEL TEXT CLASSIFICATION WITH A ROBUST LABEL DEPENDENT REPRESENTATION
Alfaro, Rodrigo
Allende, Hector
2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 211 - 214
[22] Latent Semantic Indexing and Convolutional Neural Network for Multi-Label and Multi-Class Text Classification
Quispe, Oscar
Ocsa, Alexander
Coronado, Ricardo
2017 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2017,
[23] Dual dimensionality reduction on instance-level and feature-level for multi-label data
Li, Haikun
Fang, Min
Wang, Peng
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (35) : 24773 - 24782
[24] Dual dimensionality reduction on instance-level and feature-level for multi-label data
Haikun Li
Min Fang
Peng Wang
Neural Computing and Applications, 2023, 35 : 24773 - 24782
[25] Dimensionality Reduction Using Convolutional Autoencoders
Mittal, Shweta
Sangwan, Om Prakash
ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY AND COMPUTING, AICTC 2021, 2022, 392 : 507 - 516
[26] Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms
Bromuri, Stefano
Zufferey, Damien
Hennebert, Jean
Schumacher, Michael
JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 51 : 165 - 175
[27] A novel reasoning mechanism for multi-label text classification
Wang, Ran
Ridley, Robert
Su, Xi'ao
Qu, Weiguang
Dai, Xinyu
INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (02)
[28] Multi-label legal text classification with BiLSTM and attention
Enamoto, Liriam
Santos, Andre R. A. S.
Maia, Ricardo
Weigang, Li
Rocha Filho, Geraldo P.
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2022, 68 (04) : 369 - 378
[29] Multi-label Classification of Cybersecurity Text with Distant Supervision
Ishii, Masahiro
Mori, Kento
Kuwana, Ryoichi
Matsuura, Satoshi
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, ARES 2022, 2022,
[30] Effective Multi-Label Active Learning for Text Classification
Yang, Bishan
Sun, Jian-Tao
Wang, Tengjiao
Chen, Zheng
KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 917 - 925

← 1 2 3 4 5 →