Evolving general term-weighting schemes for information retrieval: Tests on larger collections

被引:10
|
作者
Cummins, R [1 ]
O'riordan, C [1 ]
机构
[1] Natl Univ Ireland Univ Coll Galway, Dept Informat Technol, Galway, Ireland
关键词
genetic programming; information retrieval; term-weighting schemes;
D O I
10.1007/s10462-005-9001-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector space model is one such model in which the weights assigned to the document terms are of crucial importance to the accuracy of the retrieval system. This paper describes a genetic programming framework used to automatically determine term-weighting schemes that achieve a high average precision. These schemes are tested on standard test collections and are shown to perform as well as, and often better than, the modern BM25 weighting scheme. We present an analysis of the schemes evolved to explain the increase in performance. Furthermore, we show that the global (collection wide) part of the evolved weighting schemes also increases average precision over idf on larger TREC data. These global weighting schemes are shown to adhere to Luhn's resolving power as middle frequency terms are assigned the highest weight. However, the complete weighting schemes evolved on small collections do not perform as well on large collections. We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on large collections.
引用
收藏
页码:277 / 299
页数:23
相关论文
共 50 条
  • [31] Incorporating Concept Information into Term Weighting Schemes for Topic Models
    Zhang, Huakui
    Cai, Yi
    Zhu, Bingshan
    Zheng, Changmeng
    Yang, Kai
    Wong, Raymond Chi-Wing
    Li, Qing
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT II, 2020, 12113 : 227 - 244
  • [32] Query Aspect Based Term Weighting Regularization in Information Retrieval
    Zheng, Wei
    Fang, Hui
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 344 - 356
  • [33] Concept-based term weighting for web information retrieval
    Zakos, J
    Verma, B
    ICCIMA 2005: SIXTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2005, : 173 - 178
  • [34] Linear Time Series Models for Term Weighting in Information Retrieval
    Efron, Miles
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (07): : 1299 - 1312
  • [35] CONCEPT-BASED TERM WEIGHTING FOR WEB INFORMATION RETRIEVAL
    Zakos, John
    Verma, Brijesh
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2006, 6 (02) : 193 - 207
  • [36] Improving Information Retrieval Through a Global Term Weighting Scheme
    Cuellar, Daniel
    Diaz, Elva
    Ponce-de-Leon-Senti, Eunice
    PATTERN RECOGNITION (MCPR 2015), 2015, 9116 : 246 - 257
  • [37] A New Weighting Scheme and Discriminative Approach for Information Retrieval in Static and Dynamic Document Collections
    Ibrahim, Osman A. S.
    Landa-Silva, Dario
    2014 14TH UK WORKSHOP ON COMPUTATIONAL INTELLIGENCE (UKCI), 2014, : 65 - 72
  • [38] A Study of Term Weighting Schemes Using Class Information for Text Classification
    Ko, Youngjoong
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1029 - 1030
  • [39] Information-theoretic term weighting schemes for document clustering and classification
    Ke, Weimao
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2015, 16 (02) : 145 - 159
  • [40] A Part-Of-Speech term weighting scheme for biomedical information retrieval
    Wang, Yanshan
    Wu, Stephen
    Li, Dingcheng
    Mehrabi, Saeed
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 63 : 379 - 389