A method of dimensionality reduction by selection of components in principal component analysis for text classification

被引:5
作者
Zhang, Yangwu [1 ,2 ]
Li, Guohe [1 ,3 ]
Zong, Heng [2 ]
机构
[1] China Univ Petr, Coll Geophys & Informat Engn, Beijing, Peoples R China
[2] China Univ Polit Sci & Law, Dept Sci & Technol Teaching, Beijing, Peoples R China
[3] China Univ Petr, Beijing Key Lab Data Min Petr Data, Beijing, Peoples R China
关键词
Principal components analysis; Dimensionality reduction; Text classification;
D O I
10.2298/FIL1805499Z
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.
引用
收藏
页码:1499 / 1506
页数:8
相关论文
共 33 条
[1]   Guest Editors' Introduction to the Special Issue on Bayesian Nonparametrics [J].
Adams, Ryan P. ;
Fox, Emily B. ;
Sudderth, Erik B. ;
Teh, Yee Whye .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) :209-211
[2]  
[Anonymous], 2014, INT J REMOTE SENS, DOI DOI 10.13140/2.1.1593.1684
[3]  
Bishop C. M., 2007, Technometrics, DOI DOI 10.1198/TECH.2007.S518
[4]   Large-Scale Validation and Analysis of Interleaved Search Evaluation [J].
Chapelle, Olivier ;
Joachims, Thorsten ;
Radlinski, Filip ;
Yue, Yisong .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, 30 (01)
[5]   Turning from TF-IDF to TF-IGM for term weighting in text classification [J].
Chen, Kewen ;
Zhang, Zuping ;
Long, Jun ;
Zhang, Hao .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 66 :245-260
[6]  
CNNIC, 2017, CHIN STAT REP INT DE
[7]   SELECTION OF COMPONENTS IN PRINCIPAL COMPONENT ANALYSIS - A COMPARISON OF METHODS [J].
FERRE, L .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1995, 19 (06) :669-682
[8]   Combining supervised term-weighting metrics for SVM text classification with extended term representation [J].
Haddoud, Mounia ;
Mokhtari, Aicha ;
Lecroq, Thierry ;
Abdeddaim, Said .
KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 49 (03) :909-931
[9]  
Joachims T, 2006, LECT NOTES COMPUT SC, V4109, P1
[10]  
Joachims T, 2009, MACH LEARN, V77, P27, DOI [10.1007/s10994-009-5108-8, 10.1007/S10994-009-5108-8]