An integrated system of mining HTML']HTML texts and filtering structured documents

被引:0
作者
Yun, BH
Lim, ME
Park, SH
机构
[1] Elect & Telecommun Res Inst, Dept Human Informat Proc, Taejon 305350, South Korea
[2] Kookmin Univ, Sch Business IT, Seoul 136702, South Korea
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING | 2003年 / 2637卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method of mining HTML documents into structured documents and of filtering structured documents by using both slot weighting and token weighting. The goal of a mining algorithm is to find slot-token patterns in HTML documents. In order to express user interests in structured document filtering, slot and token are considered. Our preference computation algorithm applies vector similarity and Bayesian probability to filter structured documents. The experimental results show that it is important to consider hyperlinking and unlablelling in mining HTML texts; slot and token weighting can enhance the performance of structured document filtering.
引用
收藏
页码:350 / 355
页数:6
相关论文
共 10 条