A New Approach to Search Result Clustering and Labeling

被引:0
作者
Turel, Anil [1 ]
Can, Fazli [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, Bilkent Informat Retrieval Grp, TR-06800 Ankara, Turkey
来源
INFORMATION RETRIEVAL TECHNOLOGY | 2011年 / 7097卷
关键词
Cluster labeling; search result clustering; web information retrieval; INFORMATION; RETRIEVAL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In this paper, we present a novel search result clustering approach to split the long list of documents returned by search engines into meaningfully grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient-based and sequential k-means clustering algorithms. A cluster labeling method based on term weighting is also introduced for reflecting cluster contents. In addition, we present a new metric that employs precision and recall to assess the success of cluster labeling. We adopt a comparative strategy to derive the relative performance of the proposed method with respect to two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Experimental results in the publicly available AMBIENT and ODP-239 datasets show that our method can successfully achieve both clustering and labeling tasks.
引用
收藏
页码:283 / 292
页数:10
相关论文
共 22 条
  • [1] [Anonymous], AMBIENT DATASET
  • [2] [Anonymous], 1997, P 10 RES COMP LING I
  • [3] Efficiency and effectiveness of query processing in cluster-based retrieval
    Can, F
    Altingövde, IS
    Demir, E
    [J]. INFORMATION SYSTEMS, 2004, 29 (08) : 697 - 717
  • [4] CONCEPTS AND EFFECTIVENESS OF THE COVER-COEFFICIENT-BASED CLUSTERING METHODOLOGY FOR TEXT DATABASES
    CAN, F
    OZKARAHAN, EA
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 1990, 15 (04): : 483 - 517
  • [5] Can F., 2008, P 31 ANN INT ACM SIG, P885
  • [6] Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025
  • [7] Carpineto C., 2009, 0DP239 DATASET
  • [8] A Survey of Web Clustering Engines
    Carpineto, Claudio
    Osinski, Stanislaw
    Romano, Giovanni
    Weiss, Dawid
    [J]. ACM COMPUTING SURVEYS, 2009, 41 (03)
  • [9] Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations
    Carpineto, Claudio
    Mizzaro, Stefano
    Romano, Giovanni
    Snidero, Matteo
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (05): : 877 - 895
  • [10] DONG Z, 2002, THESIS SE U NANJING