A combined component approach for finding collection-adapted ranking functions based on genetic programming

被引:30
作者
Federal Univ. of Minas Gerais, Dept. of Computer Science, Belo Horizonte, Brazil [1 ]
不详 [2 ]
不详 [3 ]
机构
[1] Federal Univ. of Minas Gerais, Dept. of Computer Science, Belo Horizonte
[2] FUCAPI, Analysis, Research and Tech. Innovation Center, Manaus
[3] IST/INESC-ID, Lisboa
来源
Proc. Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. | 2007年 / 399-406期
关键词
Genetic programming; Information retrieval; Machine learning; Ranking functions; Term-weighting;
D O I
10.1145/1277741.1277810
中图分类号
学科分类号
摘要
In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA)is based on the combination of several term-weighting components (i.e.,term frequency, collection frequency, normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in our CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that our approach was able to outper form standard TF-IDF, BM25 and another GP-based approach in two different collections. CCA obtained improvements in mean average precision up to 40.87% for the TREC-8 collection, and 24.85% for the WBR99 collection (a large Brazilian Web collection), over the baseline functions. The CCA evolution process also was able to reduce the overtraining, commonly found in machine learning methods, especially genetic programming, and to converge faster than the other GP-based approach used for comparison. Copyright 2007 ACM.
引用
收藏
页码:399 / 406
页数:7
相关论文
共 26 条
  • [1] Allan J., Callan J.P., Feng F., Malin D., INQUERY and TREC-8, Proceedings of TREC-S, pp. 637-644, (1999)
  • [2] Baeza-Yates R., Ribeiro-Neto B., Modern Information Retrieval, (1999)
  • [3] Bartell B.T., Cottrell G.W., Belew R.K., Automatic combination of multiple ranked retrieval systems, Proceedings of the 17th ACM SIGIR, pp. 173-181, (1994)
  • [4] Buckley C., Singhal A., Mitra M., New retrieval approaches using smart: TREC 4, Proceedings of TREC-4, pp. 25-48, (1996)
  • [5] Fan W., Fox E.A., Pathak P., Wu H., The effects of fitness functions on genetic programming-based ranking discovery for web search, Journal of the American Society for Information Science and Technology, 55, 7, pp. 628-636, (2004)
  • [6] Fan W., Gordon M., Pathak P., On linear mixture of expert approaches to information retrieval, Decision Support Systems, 42, 2, pp. 975-987, (2006)
  • [7] Fan W., Gordon M.D., Pathak P., Personalization of search engine services for effective retrieval and knowledge management, Proceedings of the 21st Intern. Conf. on Inf. Systems, pp. 20-34, (2000)
  • [8] Fan W., Gordon M.D., Pathak P., Discovery of context-specific ranking functions for effective information retrieval using genetic programming, IEEE Transactions on Knowledge and Data Engineering, 16, 4, pp. 523-527, (2004)
  • [9] Fan W., Gordon M.D., Pathak P., A generic ranking function discovery framework by genetic programming for information retrieval, Information Processing and Management, 40, 4, pp. 587-602, (2004)
  • [10] Fan W., Gordon M.D., Pathak P., Genetic programming-based discovery of ranking functions for effective web search, Journal of Manag. Inf. Syst, 21, 4, pp. 37-56, (2005)