Web Document Clustering Research Based on Granular Computing

被引:2
作者
Zheng Shangzhi [1 ]
Zhao Xiaolong [1 ]
Zhang Buqun [1 ]
Bu Hualong [1 ]
机构
[1] Chaohu Univ, Dept Comp Sci & Technol, Chaohu, Peoples R China
来源
PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL II | 2009年
关键词
Granularcomputing; Clustering; Association rules; Web documents;
D O I
10.1109/ISECS.2009.16
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, a method of web document clustering based on granular computing (WDCGrc) is presented. The method computes the weight value of the words in documents by adopting the TF-IDF principle. Meanwhile, combinative ways defining documents threshold and average weight value are adopted to reduce dimensions and extract the keywords in each document. The paper establishes the transformation between the keywords in documents and the binary granules, and adopts the algorithm of association rules based on granular computing to obtain frequent itemsets between documents. Bring in the set theory thought, numbers of the same word between documents as the document similarity and the clustering result is obtained. The experiment shows that the method is practical and feasible, with good quality of clustering.
引用
收藏
页码:446 / 450
页数:5
相关论文
共 6 条
[1]  
AYAD H, 2002, ADV ARTIFICIAL INTEL, P161
[2]  
Lin T.Y., 1998, Rough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems, P121
[3]  
Lin T.Y., 1998, Rough Sets in Knowledge Discovery, P107
[4]  
Lin T.Y., 1997, Announcement of the BISC Special Interest Group on Granular Computing
[5]  
Salton G., 1983, INTRO MODERN INFORM
[6]  
Yao YY, 2000, PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, P186