An experimental comparison of a genetic algorithm and a hill-climber for term selection

被引：3

作者：

MacFarlane, A. ^{[1
]}

Secker, A. ^{[2
]}

May, P. ^{[2
]}

Timmis, J. ^{[3
,4
]}

机构：

[1] City Univ London, Dept Informat Sci, Ctr Interact Syst Res, London, England

[2] Univ Kent, Comp Lab, Canterbury, Kent, England

[3] Univ York, Dept Elect, York YO10 5DD, N Yorkshire, England

[4] Univ York, Dept Comp Sci, York YO10 5DD, N Yorkshire, England

来源：

JOURNAL OF DOCUMENTATION | 2010年 / 66卷 / 04期

关键词：

Information searches; Programming and algorithm theory; Retrieval performance evaluation; Indexing; INFORMATION-RETRIEVAL; RELEVANCE FEEDBACK; QUERY; OKAPI;

D O I：

10.1108/00220411011052939

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Purpose - The term selection problem for selecting query terms in information filtering and routing has been investigated using hill-climbers of various kinds, largely through the Okapi experiments in the TREC series of conferences. Although these are simple deterministic approaches, which examine the effect of changing the weight of one term at a time, they have been shown to improve the retrieval effectiveness of filtering queries in these TREC experiments. Hill-climbers are, however, likely to get trapped in local optima, and the use of more sophisticated local search techniques for this problem that attempt to break out of these optima are worth investigating. To this end, this paper aims to apply a genetic algorithm (GA) to the same problem. Design/methodology/approach - A standard TREC test collection is used from the TREC-8 filtering track, recording mean average precision and recall measures to allow comparison between the hill-climber and GAs. It also varies elements of the GA, such as probability of a word being included, probability of mutation and population size in order to measure the effect of these variables. Different strategies such as elitist and non-elitist methods are used, as well as roulette wheel and rank selection GAs. Findings - The results of tests suggest that both techniques are, on average, better than the baseline, but, the implemented GA does not match the overall performance of a hill-climber. The Rank selection algorithm does better on average than the Roulette Wheel algorithm. There is no evidence in this study that varying word inclusion probability, mutation probability or Elitist method make much difference to the overall results. Small population sizes do not appear to be as effective as larger population sizes. Research limitations/implications - The evidence provided here would suggest that being stuck in a local optima for the term selection optimization problem does not appear to be detrimental to the overall success of the hill-climber. The evidence from term rank order would appear to provide extra useful evidence, which hill climbers can use efficiently, and effectively, to narrow the search space. Originality/value - The paper represents the first attempt to compare hill-climbers with GAs on a problem of this type.

引用

页码：513 / 531

页数：19

共 36 条

[1]

[Anonymous], NIST SPECIAL PUBLICA

[2]

[Anonymous], 1999, INTRO GENETIC ALGORI

[3]

BEAULIEU M, 1997, NIST SPECIAL PUBLICA, P143

[4] On using genetic algorithms for multimodal relevance optimization in information retrieval [J].

Boughanem, M ;

Chrisment, C ;

Tamine, L .

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2002, 53 (11) :934-942

[5] A new query reweighting method for document retrieval based on genetic algorithms [J].

Chang, Yu-Chuan ;

Chen, Shyi-Ming .

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2006, 10 (05) :617-622

[6]

CHEN HC, 1995, J AM SOC INFORM SCI, V46, P194, DOI 10.1002/(SICI)1097-4571(199504)46:3<194::AID-ASI4>3.0.CO

[7]

2-S

[8]

Chen HC, 1998, J AM SOC INFORM SCI, V49, P693, DOI 10.1002/(SICI)1097-4571(199806)49:8<693::AID-ASI4>3.0.CO

[9]

2-O

[10] A generic ranking function discovery framework by genetic programming for information retrieval [J].

Fan, WG ;

Gordon, MD ;

Pathak, P .

INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (04) :587-602

← 1 2 3 4 →