Evolving general term-weighting schemes for information retrieval: Tests on larger collections

被引:10
|
作者
Cummins, R [1 ]
O'riordan, C [1 ]
机构
[1] Natl Univ Ireland Univ Coll Galway, Dept Informat Technol, Galway, Ireland
关键词
genetic programming; information retrieval; term-weighting schemes;
D O I
10.1007/s10462-005-9001-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector space model is one such model in which the weights assigned to the document terms are of crucial importance to the accuracy of the retrieval system. This paper describes a genetic programming framework used to automatically determine term-weighting schemes that achieve a high average precision. These schemes are tested on standard test collections and are shown to perform as well as, and often better than, the modern BM25 weighting scheme. We present an analysis of the schemes evolved to explain the increase in performance. Furthermore, we show that the global (collection wide) part of the evolved weighting schemes also increases average precision over idf on larger TREC data. These global weighting schemes are shown to adhere to Luhn's resolving power as middle frequency terms are assigned the highest weight. However, the complete weighting schemes evolved on small collections do not perform as well on large collections. We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on large collections.
引用
收藏
页码:277 / 299
页数:23
相关论文
共 50 条
  • [41] Effects of central tendency measures on term weighting in textual information retrieval
    Farzad Ghahramani
    Hooman Tahayori
    Andrea Visconti
    Soft Computing, 2021, 25 : 7341 - 7378
  • [42] Structural Information Based Term Weighting in Text Retrieval for Feature Location
    Bassett, Blake
    Kraft, Nicholas A.
    2013 IEEE 21ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2013, : 133 - 141
  • [43] A probabilistic justification for using tf × idf term weighting in information retrieval
    Hiemstra D.
    International Journal on Digital Libraries, 2000, 3 (2) : 131 - 139
  • [44] Effects of central tendency measures on term weighting in textual information retrieval
    Ghahramani, Farzad
    Tahayori, Hooman
    Visconti, Andrea
    SOFT COMPUTING, 2021, 25 (11) : 7341 - 7378
  • [45] An effective term weighting method using random walk model for information retrieval
    Islam, Md. Rafiqul
    Sarker, Buddha Dev
    Islam, Md. Rakibul
    2008 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING, VOLS 1-3, 2008, : 1357 - 1362
  • [46] Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval
    Zhang, Hui
    Wang, Deqing
    Wu, Wenjun
    Hu, Hongping
    ENTERPRISE INFORMATION SYSTEMS, 2012, 6 (04) : 433 - 444
  • [47] Semi-parametric and Non-parametric Term Weighting for Information Retrieval
    Metzler, Donald
    Zaragoza, Hugo
    ADVANCES IN INFORMATION RETRIEVAL THEORY, 2009, 5766 : 42 - 53
  • [48] A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
    İlker Kocabaş
    Bekir Taner Dinçer
    Bahar Karaoğlan
    Information Retrieval, 2014, 17 : 153 - 176
  • [49] A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
    Kocabas, Ilker
    Dincer, Bekir Taner
    Karaoglan, Bahar
    INFORMATION RETRIEVAL, 2014, 17 (02): : 153 - 176
  • [50] INFORMATION RETRIEVAL BY MODIFIED TERM WEIGHTING METHOD USING RANDOM WALK MODEL WITH QUERY TERM POSITION RANKING
    Arif, Abu Shamim Mohammad
    Rahman, Md Masudur
    Mukta, Shamima Yeasmin
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, : 526 - 530