Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion

被引:69
|
作者
Cobos, Carlos [1 ,2 ]
Munoz-Collazos, Henry [1 ]
Urbano-Munoz, Richar [1 ]
Mendoza, Martha [1 ,2 ]
Leon, Elizabeth [3 ]
Herrera-Viedma, Enrique [4 ,5 ]
机构
[1] Univ Cauca, Informat Technol Res Grp GTI, Popayan, Colombia
[2] Univ Cauca, Dept Comp Sci, Elect & Telecommun Engn Fac, Popayan, Colombia
[3] Univ Nacl Colombia, Syst & Ind Engn, Fac Engn, Popayan, Colombia
[4] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain
[5] King Abdulaziz Univ, Dept Elect & Comp Engn, Fac Engn, Jeddah 21589, Saudi Arabia
关键词
Cuckoo search algorithm; Clustering of web result; Web document clustering; Balanced Bayesian Information Criterion; k-Mean; K-MEANS; DESIGN;
D O I
10.1016/j.ins.2014.05.047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The clustering of web search results - or web document clustering - has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering web results already exist, but results show room for more to be done. This paper introduces a new description-centric algorithm for the clustering of web results, called WDC-CSK, which is based on the cuckoo search meta-heuristic algorithm, k-means algorithm, Balanced Bayesian Information Criterion, split and merge methods on clusters, and frequent phrases approach for cluster labeling. The cuckoo search meta-heuristic provides a combined global and local search strategy in the solution space. Split and merge methods replace the original Levy flights operation and try to improve existing solutions (nests), so they can be considered as local search methods. WDC-CSK includes an abandon operation that provides diversity and prevents the population nests from converging too quickly. Balanced Bayesian Information Criterion is used as a fitness function and allows defining the number of clusters automatically. WDC-CSK was tested with four data sets (DMOZ-50, AMBIENT, MORESQUE and ODP-239) over 447 queries. The algorithm was also compared against other established web document clustering algorithms, including Suffix Tree Clustering (STC), Lingo, and Bisecting k-means. The results show a considerable improvement upon the other algorithms as measured by recall, F-measure, fall-out, accuracy and SSLk. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:248 / 264
页数:17
相关论文
共 50 条
  • [21] Online Clustering Algorithm for Restructuring User Web Search Results
    Pavani, M.
    Teja, G. Ravi
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 27 - 36
  • [22] Data Clustering Using Cuckoo Search Algorithm (CSA)
    Manikandan, P.
    Selvarajan, S.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 : 1275 - 1283
  • [23] A balanced hybrid cuckoo search algorithm for microscopic image segmentation
    Shouvik Chakraborty
    Kalyani Mali
    Soft Computing, 2024, 28 : 5097 - 5124
  • [24] A balanced hybrid cuckoo search algorithm for microscopic image segmentation
    Chakraborty, Shouvik
    Mali, Kalyani
    SOFT COMPUTING, 2024, 28 (06) : 5097 - 5124
  • [25] Cuckoo Search Algorithm with Deep Search
    Cai Zefan
    Yang Xiaodong
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2241 - 2246
  • [26] Dynamic clustering of web search results
    Yang, L
    Rahi, A
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2003, PT 1, PROCEEDINGS, 2003, 2667 : 153 - 159
  • [27] acsFSDPC: A Density-Based Automatic Clustering Algorithm with an Adaptive Cuckoo Search
    Liu, Chang
    Shang, Junliang
    Zhu, Xuhui
    Sun, Yan
    Liu, Jin-Xing
    Zheng, Chun-Hou
    Zhang, Junying
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT II, 2018, 10955 : 470 - 482
  • [28] Search Results Clustering Algorithm based on the Suffix Tree
    Wang, Dengwei
    Liu, Libo
    Dong, Jing
    Zheng, Jiao
    2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING ICISCE 2015, 2015, : 456 - 460
  • [29] A Novel Algorithm for Restructuring Web Search Results by Clustering Pseudo Documents
    Girdhar, Salve Bhagyashri
    Wagh, R. B.
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICIP), 2015, : 795 - 800
  • [30] Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion
    Cobos, Carlos
    Andrade, Jennifer
    Constain, William
    Mendoza, Martha
    Leon, Elizabeth
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,