Leveraging Long Short-Term User Preference in Conversational Recommendation via Multi-agent Reinforcement Learning

被引:8
作者
Deng, Yang [1 ]
Li, Yaliang [2 ]
Ding, Bolin [2 ]
Lam, Wai [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong 999077, Peoples R China
[2] Alibaba Grp, Bellevue, WA 98004 USA
关键词
Oral communication; Recommender systems; History; Training; Decision making; Scalability; Representation learning; Conversational recommender system; multi-agent reinforcement learning; graph representation learning;
D O I
10.1109/TKDE.2022.3225109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conversational recommender systems (CRS) endow traditional recommender systems with the capability of dynamically obtaining users' short-term preferences for items and attributes through interactive dialogues. There are three core challenges for CRS, including the intelligent decisions for what attributes to ask, which items to recommend, and when to ask or recommend, at each conversation turn. Previous methods mainly leverage reinforcement learning (RL) to learn conversational recommendation policies for solving one or two of these three decision-making problems in CRS with separated conversation and recommendation components. These approaches restrict the scalability and generality of CRS and fall short of preserving a stable training procedure. In the light of these challenges, we tackle these three decision-making problems in CRS as a unified policy learning task. In order to leverage different features that are important to each sub-problem and facilitate better unified policy learning in CRS, we propose two novel multi-agent RL-based frameworks, namely Independent and Hierarchical Multi-Agent UNIfied COnversational RecommeNders (IMA-UNICORN and HMA-UNICORN), respectively. In specific, two low-level agents enrich the state representations for attribute prediction and item recommendation, by combining the long-term user preference information from the historical interaction data and the short-term user preference information from the conversation history. A high-level meta agent is responsible for coordinating the low-level agents to adaptively make the final decision. Experimental results on four benchmark CRS datasets and a real-world E-Commerce application show that the proposed frameworks significantly outperform state-of-the-art methods. Extensive analyses further demonstrate the superior scalability of the MARL frameworks on the multi-round conversational recommendation.
引用
收藏
页码:11541 / 11555
页数:15
相关论文
共 64 条
[1]   ON THE ROLE OF DYNAMIC-PROGRAMMING IN STATISTICAL COMMUNICATION-THEORY [J].
BELLMAN, R ;
KALABA, R .
IRE TRANSACTIONS ON INFORMATION THEORY, 1957, 3 (03) :197-203
[2]  
Bordes A., 2013, P 26 INT C NEUR INF, V2, P2787
[3]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[4]   Q&R: A Two-Stage Approach toward Interactive Recommendation [J].
Christakopoulou, Konstantina ;
Beutel, Alex ;
Li, Rui ;
Jain, Sagar ;
Chi, Ed H. .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :139-147
[5]   Towards Conversational Recommender Systems [J].
Christakopoulou, Konstantina ;
Radlinski, Filip ;
Hofmann, Katja .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :815-824
[6]  
Deng Y., 2022, arXiv
[7]   Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning [J].
Deng, Yang ;
Li, Yaliang ;
Sun, Fei ;
Ding, Bolin ;
Lam, Wai .
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :1431-1441
[8]   Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning [J].
Feng, Jun ;
Li, Heng ;
Huang, Minlie ;
Liu, Shichen ;
Ou, Wenwu ;
Wang, Zhirong ;
Zhu, Xiaoyan .
WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, :1939-1948
[9]  
Foerster JN, 2016, ADV NEUR IN, V29
[10]  
Gao C., 2021, arXiv, DOI DOI 10.1016/J.AIOPEN.2021.06.002