Multi-label Text Classification Using Semantic Features and Dimensionality Reduction with Autoencoders

被引:9
|
作者
Alkhatib, Wael [1 ]
Rensing, Christoph [1 ]
Silberbauer, Johannes [1 ]
机构
[1] Tech Univ Darmstadt, Fachgebiet Multimedia Kommunikat, S3-20,Rundeturmstr 10, D-64283 Darmstadt, Germany
来源
LANGUAGE, DATA, AND KNOWLEDGE, LDK 2017 | 2017年 / 10318卷
关键词
Semantics; Feature selection; Dimensionality reduction; Text classification; Semantic relations; Autoencoders; FEATURE-SELECTION;
D O I
10.1007/978-3-319-59888-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is of vital concern in text classification to reduce the high dimensionality of feature space. The wide range of statistical techniques which have been proposed for weighting and selecting features suffer from loss of semantic relationship among concepts and ignoring of dependencies and ordering between adjacent words. In this work we propose two techniques for incorporating semantics in feature selection. Furthermore, we use autoencoders to transform the features into a reduced feature space in order to analyse the performance penalty of feature extraction. Our intensive experiments, using the EUR-lex dataset, showed that semantic-based feature selection techniques significantly outperform the Bag-of-Word (BOW) frequency based feature selection method with term frequency/inverse document frequency (TF-IDF) for features weighting. In addition, after an aggressive dimensionality reduction of original features with a factor of 10, the autoencoders are still capable of producing better features compared to BOW with TF-IDF.
引用
收藏
页码:380 / 394
页数:15
相关论文
共 50 条
  • [41] Multi-Label Learning With Label Specific Features Using Correlation Information
    Han, Huirui
    Huang, Mengxing
    Zhang, Yu
    Yang, Xiaogang
    Feng, Wenlong
    IEEE ACCESS, 2019, 7 : 11474 - 11484
  • [42] Improving multi-label classification with missing labels by learning label-specific features
    Huang, Jun
    Qin, Feng
    Zheng, Xiao
    Cheng, Zekai
    Yuan, Zhixiang
    Zhang, Weigang
    Huang, Qingming
    INFORMATION SCIENCES, 2019, 492 : 124 - 146
  • [43] Calibrated Multi-label Classification with Label Correlations
    He, Zhi-Fen
    Yang, Ming
    Liu, Hui-Dong
    Wang, Lei
    NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1361 - 1380
  • [44] Multi-label Classification Algorithm Based on Label-Specific Features and Instance Correlations
    Zhang Y.
    Liu H.
    Zhang J.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2020, 33 (05): : 439 - 448
  • [45] Semi-Supervised Multi-Label Dimensionality Reduction Learning by Instance and Label Correlations
    Li, Runxin
    Du, Jiaxing
    Ding, Jiaman
    Jia, Lianyin
    Chen, Yinong
    Shang, Zhenhong
    MATHEMATICS, 2023, 11 (03)
  • [46] Granular ball-based label enhancement for dimensionality reduction in multi-label data
    Qian, Wenbin
    Ruan, Wenyong
    Li, Yihui
    Huang, Jintao
    APPLIED INTELLIGENCE, 2023, 53 (20) : 24008 - 24033
  • [47] Granular ball-based label enhancement for dimensionality reduction in multi-label data
    Wenbin Qian
    Wenyong Ruan
    Yihui Li
    Jintao Huang
    Applied Intelligence, 2023, 53 : 24008 - 24033
  • [48] MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text Classification
    Ye, Hui
    Sunderraman, Rajshekhar
    Ji, Shihao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4781 - 4793
  • [49] A multi-label text classification method via dynamic semantic representation model and deep neural network
    Tianshi Wang
    Li Liu
    Naiwen Liu
    Huaxiang Zhang
    Long Zhang
    Shanshan Feng
    Applied Intelligence, 2020, 50 : 2339 - 2351
  • [50] Learning Common and Label-Specific Features for Multi-Label Classification With Missing Labels
    Li, Runxin
    Ouyang, Zexian
    Shang, Zhenhong
    Jia, Lianyin
    Li, Xiaowu
    IEEE ACCESS, 2024, 12 : 81170 - 81195