The automatic classification of web pages based on neural network

被引:0
作者
Zhang, YZ [1 ]
Zhao, MS [1 ]
Wu, YS [1 ]
机构
[1] Tsing Hua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
来源
8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING | 2001年
关键词
Kohonen SOFM; classification; feature extraction;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The web pages classification is certainly important. A technique of extracting field information as common knowledge may be also needed. Compound word processing in keyword extraction from web pages is also one of important factors. In this method, the tour fields are systematically defined at first and the information related to the field is extracted. A new method of extracting feature was considered, which can incorporate three items of information: text, HTML tags and hyperlinks properly. Accordingly, this paper presents a neural network algorithm (Self-organizing feature map) to study on automatic classification of web pages. The proposed approach is based on a new set of features combined with a self-organized neural network classifier. The set of features corresponds to the contents, is selected by using a statistical reduction procedure, and provides text keywords, hyperlink and HTML tags information. The final set of features is then utilized as input vector into a proper neural network to achieve the classification goal. Web pages are classified as different classes. A series of experiments were conducted to evaluate performance of our approach. The results have shown it is quite promising.
引用
收藏
页码:570 / 575
页数:6
相关论文
共 5 条
[1]  
LI CS, 1999, IEEE COMMUNICATI JAN
[2]  
LIU Y, 1998, 2 PAC AS C PAKDD 98
[3]  
PERRIN P, 1998, 2 PAC AS C PAKDD 98, P246
[4]   SYMBOLIC AND NEURAL LEARNING ALGORITHMS - AN EXPERIMENTAL COMPARISON [J].
SHAVLIK, JW ;
MOONEY, RJ ;
TOWELL, GG .
MACHINE LEARNING, 1991, 6 (02) :111-143
[5]  
1999, IEEE COMMUNICATI JAN