Text Document Classification with PCA and One-Class SVM

被引:6
作者
Kumar, B. Shravan [1 ,2 ]
Ravi, Vadlamani [1 ]
机构
[1] Inst Dev & Res Banking Technol, Ctr Excellence Analyt, Castle Hills Rd 1, Hyderabad 500057, Andhra Pradesh, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Andhra Pradesh, India
来源
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS, FICTA 2016, VOL 1 | 2017年 / 515卷
关键词
Text mining; Dimensionality reduction; Document classification; Principal component analysis; One-class support vector machine; PRINCIPAL COMPONENT ANALYSIS; DIMENSION REDUCTION; SELECTION;
D O I
10.1007/978-981-10-3153-3_11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a document classifier based on principal component analysis (PCA) and one-class support vector machine (OCSVM), where PCA helps achieve dimensionality reduction and OCSVM performs classification. Initially, PCA is invoked on the document-term matrix resulting in choosing the top few principal components. Later, OCSVM is trained on the records of the matrix corresponding to the negative class. Then, we tested the trained OCSVM with the records of the matrix corresponding to the positive class. The effectiveness of the proposed model is demonstrated on the popular datasets, viz., 20NG, malware, Syskill, & Webert, and customer feedbacks of a Bank. We observed that the hybrid yielded very high accuracies in all datasets.
引用
收藏
页码:107 / 115
页数:9
相关论文
共 26 条
[1]   ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS [J].
ANDERSON, TW .
ANNALS OF MATHEMATICAL STATISTICS, 1963, 34 (01) :122-&
[2]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[3]  
[Anonymous], 2012, MATLAB
[4]   Dimension Reduction: A Guided Tour [J].
Burges, Christopher J. C. .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2010, 2 (04) :275-365
[5]  
Chen YQ, 2001, IEEE IMAGE PROC, P34, DOI 10.1109/ICIP.2001.958946
[6]  
Chinta PM, 2012, LECT NOTES COMPUT SC, V7663, P366, DOI 10.1007/978-3-642-34475-6_44
[7]  
Dorre J., 1999, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, P398
[8]  
Elkan C., 2008, P 14 ACM SIGKDD INT, P213, DOI 10.1145/1401890.1401920
[9]   SELECTION OF COMPONENTS IN PRINCIPAL COMPONENT ANALYSIS - A COMPARISON OF METHODS [J].
FERRE, L .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1995, 19 (06) :669-682
[10]  
Gilleron M., 2002, P 9 INT C INF PROC M, P1927