Web Document Clustering Research Based on Granular Computing

被引：2

作者：

Zheng Shangzhi ^{[1
]}

Zhao Xiaolong ^{[1
]}

Zhang Buqun ^{[1
]}

Bu Hualong ^{[1
]}

机构：

[1] Chaohu Univ, Dept Comp Sci & Technol, Chaohu, Peoples R China

来源：

PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL II | 2009年

关键词：

Granularcomputing; Clustering; Association rules; Web documents;

D O I：

10.1109/ISECS.2009.16

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, a method of web document clustering based on granular computing (WDCGrc) is presented. The method computes the weight value of the words in documents by adopting the TF-IDF principle. Meanwhile, combinative ways defining documents threshold and average weight value are adopted to reduce dimensions and extract the keywords in each document. The paper establishes the transformation between the keywords in documents and the binary granules, and adopts the algorithm of association rules based on granular computing to obtain frequent itemsets between documents. Bring in the set theory thought, numbers of the same word between documents as the document similarity and the clustering result is obtained. The experiment shows that the method is practical and feasible, with good quality of clustering.

引用

页码：446 / 450

页数：5