A deep actor critic reinforcement learning framework for learning to rank

被引:15
作者
Padhye, Vaibhav [1 ]
Lakshmanan, Kailasam [1 ]
机构
[1] IIT BHU, Dept Comp Sci, Varanasi, India
关键词
Reinforcement learning; Learning to Rank; Deep reinforcement learning; Policy gradient;
D O I
10.1016/j.neucom.2023.126314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a Deep Reinforcement learning based approach for Learning to rank task. Reinforcement Learning has been applied in the ranking task with good success, but the existing Policy Gradient based approaches suffer from noisy gradients and high variance, resulting in unstable learning. The natural policy gradient methods like REINFORCE perform Monte Carlo sampling, thus taking samples randomly, which leads to high variance. As the action space becomes large, i.e., with a very large number of documents, traditional RL techniques lack the complex model required in the scenario to deal with a large number of items. We propose a Deep Reinforcement learning based approach for learning to rank task in this paper to address these issues. By combining Deep learning with the Reinforcement Learning framework, our approach can learn a complex function as deep neural networks can provide significant function approximation. We used Actor-Critic framework where the critic network can reduce variance by utilizing techniques such as clipped delayed policy updates, clipped double q learning, etc. Also, due to the enormous space of the web, the most relevant results are needed to be returned for the cor-responding query from within a large action space. Policy gradient algorithms have been effectively applied to problems in large action spaces(items) with deep neural networks as they don't rely on finding value for each action(item) as in value-based methods. Further, we use an actor-network with a CNN layer in the ranking process to capture the sequential patterns among the documents. We utilize the TD3 method to train our Reinforcement Learning agent with a listwise loss function, which performs delayed policy updates resulting in value estimates with lower variance. To the best of our knowledge, this is the first Deep reinforcement learning method applied in Learning to Rank for document retrieval. We performed experiments on the various Letor datasets and showed that our method outperforms var-ious state-of-the-art baselines.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 62 条
[21]  
Kveton B, 2015, PR MACH LEARN RES, V37, P767
[22]  
Li H., 2011, Learning to rank for information retrieval and natural language processing
[23]  
Lillicrap TP, 2015, arXiv
[24]  
Liu F, 2018, Arxiv, DOI arXiv:1802.08401
[25]   Reward Shaping-Based Actor-Critic Deep Reinforcement Learning for Residential Energy Management [J].
Lu, Renzhi ;
Jiang, Zhenyu ;
Wu, Huaming ;
Ding, Yuemin ;
Wang, Dong ;
Zhang, Hai-Tao .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (03) :2662-2673
[26]   Win-Win Search: Dual-Agent Stochastic Game in Session Search [J].
Luo, Jiyun ;
Zhang, Sicong ;
Yang, Hui .
SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, :587-596
[27]  
Mnih V, 2013, Arxiv, DOI [arXiv:1312.5602, 10.48550/arXiv.1312.5602]
[28]  
Mnih V, 2016, PR MACH LEARN RES, V48
[29]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533
[30]   A Reinforcement Learning Framework for Relevance Feedback [J].
Montazeralghaem, Ali ;
Zamani, Hamed ;
Allan, James .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :59-68