A combined component approach for finding collection-adapted ranking functions based on genetic programming

被引:30
作者
Federal Univ. of Minas Gerais, Dept. of Computer Science, Belo Horizonte, Brazil [1 ]
不详 [2 ]
不详 [3 ]
机构
[1] Federal Univ. of Minas Gerais, Dept. of Computer Science, Belo Horizonte
[2] FUCAPI, Analysis, Research and Tech. Innovation Center, Manaus
[3] IST/INESC-ID, Lisboa
来源
Proc. Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. | 2007年 / 399-406期
关键词
Genetic programming; Information retrieval; Machine learning; Ranking functions; Term-weighting;
D O I
10.1145/1277741.1277810
中图分类号
学科分类号
摘要
In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA)is based on the combination of several term-weighting components (i.e.,term frequency, collection frequency, normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in our CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that our approach was able to outper form standard TF-IDF, BM25 and another GP-based approach in two different collections. CCA obtained improvements in mean average precision up to 40.87% for the TREC-8 collection, and 24.85% for the WBR99 collection (a large Brazilian Web collection), over the baseline functions. The CCA evolution process also was able to reduce the overtraining, commonly found in machine learning methods, especially genetic programming, and to converge faster than the other GP-based approach used for comparison. Copyright 2007 ACM.
引用
收藏
页码:399 / 406
页数:7
相关论文
共 26 条
  • [21] Singhal A., Buckley C., Mitra M., Pivoted document length normalization, Proceedings of the 19th ACM SIGIR, pp. 21-29, (1996)
  • [22] Trotman A., Learning to rank, Information Retrieval, 8, 3, pp. 359-381, (2005)
  • [23] Vogt C.C., Cottrell G.W., Fusion via a linear combination of scores, Information Retrieval, 1, 3, pp. 151-173, (1999)
  • [24] Voorhees E.M., Harman D., Overview of the eighth Text REtrieval Conference (TREC-8), Proceedings of TREC-8, (1999)
  • [25] Witten I.H., Moffat A., Bell T.C., Managing Gigabytes: Compressing and Indexing Documents and Images, (1999)
  • [26] Zobel J., Moffat A., Exploring the similarity space, SIGIR Forum, 32, 1, pp. 453-490, (1998)