Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017 | 2017年 / 10318卷
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [31] A NEW INPUT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Alfaro, Rodrigo
    Allende, Hector
    2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 207 - 210
  • [32] Semantic Feature Analysis for Multi-Label Text Classification on Topics of the Al-Quran Verses
    Mediamer, Gugun
    Adiwijaya
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2024, 20 (01): : 1 - 12
  • [33] Text Classification Based on a Novel Ensemble Multi-Label Learning Method
    Zhang, Tao
    Wu, Jiansheng
    Hu, Haifeng
    2014 2ND INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2014, : 964 - 968
  • [34] Training-Less Multi-label Text Classification Using Knowledge Bases and Word Embeddings
    Alkhatib, Wael
    Schnitzer, Steffen
    Rensing, Christoph
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 97 - 104
  • [35] Semi-supervised multi-label dimensionality reduction learning based on minimizing redundant correlation of specific and common features
    Li, Runxin
    Zhou, Gaozhi
    Li, Xiaowu
    Jia, Lianyin
    Shang, Zhenhong
    KNOWLEDGE-BASED SYSTEMS, 2024, 294
  • [36] Multi-Label Classification Using Dependent and Independent Dual Space Reduction
    Pacharawongsakda, Eakasit
    Theeramunkong, Thanaruk
    COMPUTER JOURNAL, 2013, 56 (09) : 1113 - 1135
  • [37] Learning label-specific features with global and local label correlation for multi-label classification
    Weng, Wei
    Wei, Bowen
    Ke, Wen
    Fan, Yuling
    Wang, Jinbo
    Li, Yuwen
    APPLIED INTELLIGENCE, 2023, 53 (03) : 3017 - 3033
  • [38] Deep Matrix Factorization With Complementary Semantic Aggregation for Micro-Video Multi-Label Classification
    Jing, Peiguang
    Liu, Xiaoyu
    Wang, Xuehui
    Su, Yuting
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1685 - 1689
  • [39] MULTI-RELATION MESSAGE PASSING FOR MULTI-LABEL TEXT CLASSIFICATION
    Ozmen, Muberra
    Zhang, Hao
    Wang, Pengyun
    Coates, Mark
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3583 - 3587
  • [40] Learning label-specific features with global and local label correlation for multi-label classification
    Wei Weng
    Bowen Wei
    Wen Ke
    Yuling Fan
    Jinbo Wang
    Yuwen Li
    Applied Intelligence, 2023, 53 : 3017 - 3033