A reinforcement learning recommender system using bi-clustering and Markov Decision Process

被引:8
作者
Iftikhar, Arta [1 ]
Ghazanfar, Mustansar Ali [2 ]
Ayub, Mubbashir [1 ]
Alahmari, Saad Ali [3 ]
Qazi, Nadeem [2 ]
Wall, Julie [2 ]
机构
[1] Univ Engn & Technol, Dept Software Engn, Taxila, Pakistan
[2] Univ East London, Dept Comp Sci & Digital Technol, London, England
[3] AL Imam Mohammad Ibn Saud Islamic Univ, Dept Comp Sci, Riyadh, Saudi Arabia
关键词
Reinforcement learning; Markov Decision Process; Bi-clustering; Q-learning; Policy; ALGORITHM; PERSONALIZATION; ACCURACY; IMPROVE;
D O I
10.1016/j.eswa.2023.121541
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Collaborative filtering (CF) recommender systems are static in nature and does not adapt well with changing user preferences. User preferences may change after interaction with a system or after buying a product. Conventional CF clustering algorithms only identifies the distribution of patterns and hidden correlations globally. However, the impossibility of discovering local patterns by these algorithms, headed to the popularization of bi-clustering algorithms. Bi-clustering algorithms can analyze all dataset dimensions simultaneously and consequently, discover local patterns that deliver a better understanding of the underlying hidden correlations. In this paper, we modelled the recommendation problem as a sequential decision-making problem using Markov Decision Processes (MDP). To perform state representation for MDP, we first converted user-item votings matrix to a binary matrix. Then we performed bi-clustering on this binary matrix to determine a subset of similar rows and columns. A bi-cluster merging algorithm is designed to merge similar and overlapping bi-clusters. These biclusters are then mapped to a squared grid (SG). RL is applied on this SG to determine best policy to give recommendation to users. Start state is determined using Improved Triangle Similarity (ITR similarity measure. Reward function is computed as grid state overlapping in terms of users and items in current and prospective next state. A thorough comparative analysis was conducted, encompassing a diverse array of methodologies, including RL-based, pure Collaborative Filtering (CF), and clustering methods. The results demonstrate that our proposed method outperforms its competitors in terms of precision, recall, and optimal policy learning.
引用
收藏
页数:18
相关论文
共 50 条
[31]   Adaptive aggregation for reinforcement learning in average reward Markov decision processes [J].
Ronald Ortner .
Annals of Operations Research, 2013, 208 :321-336
[32]   Adaptive aggregation for reinforcement learning in average reward Markov decision processes [J].
Ortner, Ronald .
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) :321-336
[33]   On uniform concentration bounds for Bi-clustering by using the Vapnik-Chervonenkis theory [J].
Chakraborty, Saptarshi ;
Das, Swagatam .
STATISTICS & PROBABILITY LETTERS, 2021, 175
[34]   A reinforcement learning based algorithm for Markov decision processes [J].
Bhatnagar, S ;
Kumar, S .
2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, :199-204
[35]   Bi-clustering of microarray data using a symmetry-based multi-objective optimization framework [J].
Acharya, Sudipta ;
Saha, Sriparna ;
Sahoo, Pracheta .
SOFT COMPUTING, 2019, 23 (14) :5693-5714
[36]   A sensitivity view of Markov decision processes and reinforcement learning [J].
Cao, XR .
MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 :261-283
[37]   Session-aware recommender system using double deep reinforcement learning [J].
Khurana, Purnima ;
Gupta, Bhavna ;
Sharma, Ravish ;
Bedi, Punam .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (02) :403-429
[38]   Session-aware recommender system using double deep reinforcement learning [J].
Purnima Khurana ;
Bhavna Gupta ;
Ravish Sharma ;
Punam Bedi .
Journal of Intelligent Information Systems, 2024, 62 :403-429
[39]   A Knowledge Graph-based Interactive Recommender System Using Reinforcement Learning [J].
Sun, Ruoxi ;
Yan, Jun ;
Ren, Fenghui .
2022 TENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, CBD, 2022, :73-78
[40]   Assistive System for People with Apraxia Using A Markov Decision Process [J].
Jean-Baptiste, Emilie M. D. ;
Russell, Martin ;
Rothstein, Pia .
E-HEALTH - FOR CONTINUITY OF CARE, 2014, 205 :687-691