Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation

被引:9
作者
Wu, Yaxiong [1 ]
Macdonald, Craig [1 ]
Ounis, Iadh [1 ]
机构
[1] Univ Glasgow, Glasgow, Lanark, Scotland
来源
15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021) | 2021年
基金
英国工程与自然科学研究理事会;
关键词
interactive recommendation; multimodal; reinforcement learning;
D O I
10.1145/3460231.3474256
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A dialog-based interactive recommendation task is where users can express natural-language feedback when interacting with the recommender system. However, the users' feedback, which takes the form of natural-language critiques about the recommendation at each iteration, can only allow the recommender system to obtain a partial portrayal of the users' preferences. Indeed, such partial observations of the users' preferences from their natural-language feedback make it challenging to correctly track the users' preferences over time, which can result in poor recommendation performances and a less effective satisfaction of the users' information needs when in presence of limited iterations. Reinforcement learning, in the form of a partially observable Markov decision process (POMDP), can simulate the interactions between a partially observable environment (i.e. a user) and an agent (i.e. a recommender system). To alleviate such a partial observation issue, we propose a novel dialog-based recommendation model, the Estimator-Generator-Evaluator (EGE) model, with Q-learning for POMDP, to effectively incorporate the users' preferences over time. Specifically, we leverage an Estimator to track and estimate users' preferences, a Generator to match the estimated preferences with the candidate items to rank the next recommendations, and an Evaluator to judge the quality of the estimated preferences considering the users' historical feedback. Following previous work, we train our EGE model by using a user simulator which itself is trained to describe the differences between the target users' preferences and the recommended items in natural language. Thorough and extensive experiments conducted on two recommendation datasets - addressing images of fashion products (namely dresses and shoes) - demonstrate that our proposed EGE model yields significant improvements in comparison to the existing state-of-the-art baseline models.
引用
收藏
页码:241 / 251
页数:11
相关论文
共 40 条
  • [1] [Anonymous], 1999, STOCH MODEL SER, DOI 10.1201/9781315140223
  • [2] Berg TL, 2010, LECT NOTES COMPUT SC, V6311, P663, DOI 10.1007/978-3-642-15549-9_48
  • [3] User Response Models to Improve a REINFORCE Recommender System
    Chen, Minmin
    Chang, Bo
    Xu, Can
    Chi, Ed H.
    [J]. WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 121 - 129
  • [4] Top-K Off-Policy Correction for a REINFORCE Recommender System
    Chen, Minmin
    Beutel, Alex
    Covington, Paul
    Jain, Sagar
    Belletti, Francois
    Chi, Ed H.
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 456 - 464
  • [5] Chung J., 2014, PREPRINT
  • [6] Gangwani T, 2020, PR MACH LEARN RES, V115, P1061
  • [7] Gao Chongming, 2021, ARXIV PREPRINT ARXIV
  • [8] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
  • [9] Guo X., 2019, ARXIV PREPRINT ARXIV
  • [10] Guo XX, 2018, ADV NEUR IN, V31