Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017 | 2017年 / 10318卷
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [1] A Review on Dimensionality Reduction for Multi-Label Classification
    Siblini, Wissam
    Kuntz, Pascale
    Meyer, Frank
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (03) : 839 - 857
  • [2] Dimensionality Reduction for Hierarchical Multi-Label Classification: A Systematic Mapping Study
    Vieira, Raimundo Osvaldo
    Borges, Helyane Bronoski
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (01) : 130 - 150
  • [3] Integrating Label Semantic Similarity Scores into Multi-label Text Classification
    Chen, Zihao
    Liu, Yang
    Cheng, Baitai
    Peng, Jing
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 234 - 245
  • [4] Multi-label Text Classification Method Based on Label Semantic Information
    Xiao L.
    Chen B.-L.
    Huang X.
    Liu H.-F.
    Jing L.-P.
    Yu J.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (04): : 1079 - 1089
  • [5] Multi-label dimensionality reduction and classification with extreme learning machines
    Lin Feng
    Jing Wang
    Shenglan Liu
    Yao Xiao
    Journal of Systems Engineering and Electronics, 2014, 25 (03) : 502 - 513
  • [6] Multi-label dimensionality reduction and classification with extreme learning machines
    Feng, Lin
    Wang, Jing
    Liu, Shenglan
    Xiao, Yao
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2014, 25 (03) : 502 - 513
  • [7] Multi-label text classification model based on semantic embedding
    Yan Danfeng
    Ke Nan
    Gu Chao
    Cui Jianfei
    Ding Yiqi
    The Journal of China Universities of Posts and Telecommunications, 2019, 26 (01) : 95 - 104
  • [8] Noisy multi-label semi-supervised dimensionality reduction
    Mikalsen, Karl Oyvind
    Soguero-Ruiz, Cristina
    Bianchi, Filippo Maria
    Jenssen, Robert
    PATTERN RECOGNITION, 2019, 90 : 257 - 270
  • [9] A multi-label text classification method via dynamic semantic representation model and deep neural network
    Wang, Tianshi
    Liu, Li
    Liu, Naiwen
    Zhang, Huaxiang
    Zhang, Long
    Feng, Shanshan
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2339 - 2351
  • [10] An Enhanced Dimensionality Reduction for Multi-label Learning
    Shao, Yanqing
    Yan, Kai
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 163 - 170