Using genetic algorithms to evolve a population of topical queries

被引:15
作者
Cecchini, Rocio L. [1 ,2 ]
Lorenzetti, Carlos M. [1 ,3 ]
Maguitman, Ana G. [1 ,3 ]
Beatriz Brignole, Nelida [1 ,2 ,4 ]
机构
[1] Univ Nacl Sur, Dept Ciencias & Ingn Computac, RA-8000 Bahia Blanca, Buenos Aires, Argentina
[2] Univ Nacl Sur, LIDeCC, RA-8000 Bahia Blanca, Buenos Aires, Argentina
[3] Univ Nacl Sur, LIDIA, RA-8000 Bahia Blanca, Buenos Aires, Argentina
[4] Planta Piloto Ingn Quim UNS CONICET, RA-8000 Bahia Blanca, Buenos Aires, Argentina
关键词
Web search; Context; Genetic algorithms; Query formulation; Novelty;
D O I
10.1016/j.ipm.2007.12.012
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Systems for searching the Web based on thematic contexts can be built on top of a conventional search engine and benefit from the huge amount of content as well as from the functionality available through the search engine interface. The quality of the material collected by such systems is highly dependant oil the vocabulary used to generate the search queries. In this scenario, selecting good query terms can be seen as an optimization problem where the objective function to be optimized is based on the effectiveness of a query to retrieve relevant material. Some characteristics of this optimization problem are: (1) the high-dimensionality of the search space, where candidate solutions are queries and each term corresponds to a different dimension, (2) the existence of acceptable suboptimal solutions, (3) the possibility of finding multiple solutions, and in many cases (4) the quest for novelty. This article describes optimization techniques based on Genetic Algorithms to evolve "good query terms" in the context of a given topic. The proposed techniques place emphasis on searching for novel material that is related to the search context. We discuss the use of a mutation pool to allow the generation of queries with new terms, study the effect of different mutation rates on the exploration of query-space, and discuss the use of a especially developed fitness function that favors the construction of queries containing novel but related terms. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1863 / 1878
页数:16
相关论文
共 36 条
[1]  
Armstrong R., 1995, AAAI SPRING S INF GA, P6
[2]   LOCAL FEEDBACK IN FULL-TEXT RETRIEVAL SYSTEMS [J].
ATTAR, R ;
FRAENKEL, AS .
JOURNAL OF THE ACM, 1977, 24 (03) :397-417
[3]  
Baeza-Yates R., 1999, Modern Information Retrieval, Book
[4]   Information access in context [J].
Budzik, J ;
Hammond, KJ ;
Birnbaum, L .
KNOWLEDGE-BASED SYSTEMS, 2001, 14 (1-2) :37-53
[5]  
BUDZIK J, 2000, P 2000 WORKSH ART IN
[6]  
Calishain T., 2003, GOOGLE HACKS 100 IND
[7]   Improving search results with data mining in a thematic search engine [J].
Caramia, M ;
Felici, G ;
Pezzoli, A .
COMPUTERS & OPERATIONS RESEARCH, 2004, 31 (14) :2387-2404
[8]   Focused crawling: a new approach to topic-specific Web resource discovery [J].
Chakrabarti, S ;
van den Berg, M ;
Dom, B .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16) :1623-1640
[9]  
FRIEDER O, 1991, P 14 ANN INT ACM SIG, P230
[10]   SEARCH IMPROVEMENT VIA AUTOMATIC QUERY REFORMULATION [J].
GAUCH, S ;
SMITH, JB .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1991, 9 (03) :249-280