Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion

被引:69
|
作者
Cobos, Carlos [1 ,2 ]
Munoz-Collazos, Henry [1 ]
Urbano-Munoz, Richar [1 ]
Mendoza, Martha [1 ,2 ]
Leon, Elizabeth [3 ]
Herrera-Viedma, Enrique [4 ,5 ]
机构
[1] Univ Cauca, Informat Technol Res Grp GTI, Popayan, Colombia
[2] Univ Cauca, Dept Comp Sci, Elect & Telecommun Engn Fac, Popayan, Colombia
[3] Univ Nacl Colombia, Syst & Ind Engn, Fac Engn, Popayan, Colombia
[4] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain
[5] King Abdulaziz Univ, Dept Elect & Comp Engn, Fac Engn, Jeddah 21589, Saudi Arabia
关键词
Cuckoo search algorithm; Clustering of web result; Web document clustering; Balanced Bayesian Information Criterion; k-Mean; K-MEANS; DESIGN;
D O I
10.1016/j.ins.2014.05.047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The clustering of web search results - or web document clustering - has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering web results already exist, but results show room for more to be done. This paper introduces a new description-centric algorithm for the clustering of web results, called WDC-CSK, which is based on the cuckoo search meta-heuristic algorithm, k-means algorithm, Balanced Bayesian Information Criterion, split and merge methods on clusters, and frequent phrases approach for cluster labeling. The cuckoo search meta-heuristic provides a combined global and local search strategy in the solution space. Split and merge methods replace the original Levy flights operation and try to improve existing solutions (nests), so they can be considered as local search methods. WDC-CSK includes an abandon operation that provides diversity and prevents the population nests from converging too quickly. Balanced Bayesian Information Criterion is used as a fitness function and allows defining the number of clusters automatically. WDC-CSK was tested with four data sets (DMOZ-50, AMBIENT, MORESQUE and ODP-239) over 447 queries. The algorithm was also compared against other established web document clustering algorithms, including Suffix Tree Clustering (STC), Lingo, and Bisecting k-means. The results show a considerable improvement upon the other algorithms as measured by recall, F-measure, fall-out, accuracy and SSLk. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:248 / 264
页数:17
相关论文
共 50 条
  • [31] A new quantum chaotic cuckoo search algorithm for data clustering
    Boushaki, Saida Ishak
    Kamel, Nadjet
    Bendjeghaba, Omar
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 96 : 358 - 372
  • [32] Phrase-based hierarchical clustering of web search results
    Maslowska, I
    ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 555 - 562
  • [33] Clustering Chinese Web Search Results based on Association Calculation
    Zhao, Ying
    Du, Yajun
    Peng, Qiangqiang
    RECENT TRENDS IN MATERIALS AND MECHANICAL ENGINEERING MATERIALS, MECHATRONICS AND AUTOMATION, PTS 1-3, 2011, 55-57 : 1418 - 1423
  • [34] Cuckoo Search Algorithm with Balanced Learning to Solve Logistics Distribution Problem
    Li, Juan
    Liu, Han-xia
    BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS, PT 2, BIC-TA 2023, 2024, 2062 : 171 - 181
  • [35] A new algorithm for clustering search results
    Mecca, Giansalvatore
    Raunich, Salvatore
    Pappalardo, Alessandro
    DATA & KNOWLEDGE ENGINEERING, 2007, 62 (03) : 504 - 522
  • [36] An Incremental Algorithm for Clustering Search Results
    Liu, Yongli
    Ouyang, Yuanxin
    Sheng, Hao
    Xiong, Zhang
    SITIS 2008: 4TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY AND INTERNET BASED SYSTEMS, PROCEEDINGS, 2008, : 112 - 117
  • [37] Dynamic cuckoo search algorithm based on Taguchi opposition-based search
    Li, Juan
    Li, Yuan-xiang
    Tian, Sha-sha
    Zou, Jie
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2019, 13 (01) : 59 - 69
  • [38] An improved adaptive cuckoo search algorithm based on the population feature and iteration information
    Jia Chaochuan
    Yang Ting
    Wang Chuanjiang
    Fan Binghui
    He Fugui
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2020, 24 (03) : 233 - 248
  • [39] Clustering and Visualization on Web Search Results: A Survey
    Kedia, Shefali
    Wagh, Kishor
    Chatur, Prashant
    RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 3, 2018, 709 : 125 - 130
  • [40] Landscape of Web Search Results Clustering Algorithms
    Bharambe, Ujwala
    Kale, Archana
    ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL, 2011, 125 : 95 - +