Clustering high dimensional data using SVM

被引:0
作者
Lin, Tsau Young [1 ]
Ngo, Tam [1 ]
机构
[1] San Jose State Univ, Dept Comp Sci, San Jose, CA 95192 USA
来源
ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS | 2007年 / 4482卷
关键词
SVM; SVD; LSI; clustering; text classification; unsupervised;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Web contains massive amount of documents to the point where it has become impossible to classify them manually. This project's goal is to find a new method for clustering documents that is as close to humans' classification as possible and at the same time to reduce the size of the documents. This project uses a combination of Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD) calculation and Support Vector Machine (SVM) classification. Using SVD, data is decomposed and truncated to reduce the data size. The reduced data will be clustered into different categories. Using SVM, clustered data from SVD calculation is used for training to allow new data to be classified based on SVM's prediction. The project's result show that the method of combining SVD and SVM is able to reduce data size and classifies documents reasonably compared to humans' classification.
引用
收藏
页码:256 / +
页数:3
相关论文
共 9 条
[1]  
[Anonymous], 2006, LIBSVM LIB SUPPORT V
[2]  
[Anonymous], 2000, ACM SIGKDD EXPLOR NE, DOI DOI 10.1145/380995.380999
[3]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction
[4]  
FAN R, 2006, LIBSVM DATA CLASSIFI
[5]  
GARCIA E, 2006, SVD LSI TUTORIAL 4 L, V4
[6]  
HICKLIN J, 2006, JAMA JAVA MATRIX PAC
[7]  
Joachims T, 1998, TEXT CATEGORIZATION
[8]  
JOACHIMS T, 2006, SUPPORT VECTOR MACHI
[9]  
Vapnik V, 2000, NATURE STAT LEARNING