A Fast Chinese Web-Document Clustering Method under Pareto's Principle

被引:1
作者
Zhang Tianlei [1 ]
Chen Guishen [2 ]
Che Hao [3 ]
机构
[1] Tsinghua Univ Beijing, Dept Comp Sci & Technol, Beijing 100086, Peoples R China
[2] Inst Beijing Elect Syst Engn, Beijing 100141, Peoples R China
[3] Beijing City Univ, Artificial Intelligence Inst, Beijing 100083, Peoples R China
来源
2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2 | 2008年
关键词
D O I
10.1109/GRC.2008.4664707
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays most search engine like Google, Baidu, demonstrate their query results by the value of item, listing them in several pages. As we are now in an age of information explosion, the number of pages will be huge and users have to glance over several before they get what they want. If we cluster the results, this problem will be solved. There are several clustering methods, but not quite accurate and efficient, epically when the result sets are consist of millions of items. this article describe an fast method under Pareto's Principle.
引用
收藏
页码:801 / +
页数:2
相关论文
共 5 条
[1]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[2]  
Han J.M. Kamber., 2001, DATA MINING CONCEPT
[3]  
SLONIM N, 2000, RES DEV INFORM RETRI, P208
[4]  
Zamir O., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P46, DOI 10.1145/290941.290956
[5]   Grouper: a dynamic clustering interface to Web search results [J].
Zamir, O ;
Etzioni, O .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16) :1361-1374