SVM based Chinese web page automatic classification

被引:4
作者
Liang, JZ [1 ]
机构
[1] Zhejiang Normal Univ, Inst Comp Sci, Jinhua 321004, Peoples R China
来源
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS | 2003年
关键词
support vector machine; statistic learning; web page; text classification; pattern recognition;
D O I
10.1109/ICMLC.2003.1259884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with Chinese web page classification based on support vector machine (SVM). First, Some methods are proposed for feature extraction and selection based on textual keywords. Then Special problems are discussed on statistic learning theory, support vector machine and their application in classification. Quadratic program algorithm is also described for constructing the SVM classifier. In the experiment part, the sample set, including 5096 samples, is chosen from the web version of Chinese People's Daily. It is separated into two sets, the training set with 3398 samples and the test set with 1698 samples. Two kinds of kernel function, polynomial and radial basis function, are considered in constructing the SVM classifier. The final classification correct rates are 89.81%, 86.51% for the two classifiers, respectively.
引用
收藏
页码:2265 / 2268
页数:4
相关论文
共 7 条
  • [1] Fan Yan, 2001, Journal of Software, V12, P1386
  • [2] LU J, 2000, DATA COMMUNICATION, P5
  • [3] Machine learning in automated text categorization
    Sebastiani, F
    [J]. ACM COMPUTING SURVEYS, 2002, 34 (01) : 1 - 47
  • [4] SHI ZZ, 2002, KNOWLEDGE DISCOVERIN
  • [5] Vapnik V., 1998, STAT LEARNING THEORY, V1, P2
  • [6] ZHANG YZ, 2001, ICONIP2001 P NOV 14, V2, P570
  • [7] ZHU M, 2000, COMPUTER ENG, V26, P35