A method of dimensionality reduction by selection of components in principal component analysis for text classification

被引:5
|
作者
Zhang, Yangwu [1 ,2 ]
Li, Guohe [1 ,3 ]
Zong, Heng [2 ]
机构
[1] China Univ Petr, Coll Geophys & Informat Engn, Beijing, Peoples R China
[2] China Univ Polit Sci & Law, Dept Sci & Technol Teaching, Beijing, Peoples R China
[3] China Univ Petr, Beijing Key Lab Data Min Petr Data, Beijing, Peoples R China
关键词
Principal components analysis; Dimensionality reduction; Text classification;
D O I
10.2298/FIL1805499Z
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.
引用
收藏
页码:1499 / 1506
页数:8
相关论文
共 50 条
  • [1] Least squares regression principal component analysis: A supervised dimensionality reduction method
    Pascual, Hector
    Yee, Xin C.
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2022, 29 (01)
  • [2] Arabic text classification using principal component analysis with different supervised classifiers
    Louail, Marwa
    Kara-Mohamed, Chafia Hamdi-Cherif
    Hamdi-Cherif, Aboubekeur
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 979 - 984
  • [3] Dimensionality reduction in text classification using scatter method
    Saarikoski, Jyri
    Laurikkala, Jorma
    Jarvelin, Kalervo
    Siermala, Markku
    Juhola, Martti
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (01) : 1 - 21
  • [4] Principal Component Analysis based on data characteristics for dimensionality reduction of ECG recordings in arrhythmia classification
    Wosiak, Agnieszka
    OPEN PHYSICS, 2019, 17 (01): : 489 - 496
  • [5] Dimensionality Reduction by Mutual Information for Text Classification
    刘丽珍
    宋瀚涛
    陆玉昌
    Journal of Beijing Institute of Technology, 2005, (01) : 32 - 36
  • [6] A Novel Rule-Based Skin Detection Method using Principal Component Analysis-Based Dimensionality Reduction and Individual Contribution on Principal Components
    Sahnoune, Abdelkrim
    Dahmani, Djamila
    Aouat, Saliha
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [7] An effective dimensionality reduction method for text classification based on TFP-tree
    Liu, Lu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1893 - 1905
  • [8] Dimensionality Reduction of Speech Features using Nonlinear Principal Components Analysis
    Zahorian, Stephen A.
    Singh, Tara
    Hu, Hongbing
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 281 - +
  • [9] Principal components selection for dimensionality reduction using discriminant information applied to fault diagnosis
    Prieto-Moreno, A.
    Llanes-Santiago, O.
    Garcia-Moreno, E.
    JOURNAL OF PROCESS CONTROL, 2015, 33 : 14 - 24
  • [10] A Method for Principal Components Selection Based on Stochastic Matrix
    Zhang, Yangwu
    Li, Guohe
    Wang, Limei
    Zong, Heng
    Zhao, Jingming
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,