xMTF: A Formula-Free Model for Reinforcement-Learning-Based Multi-Task Fusion in Recommender Systems

被引:0
作者
Cao, Yang [1 ]
Zhang, Changhao [2 ]
Chen, Xiaoshuang [1 ]
Zhan, Kaiqiao [1 ]
Wang, Ben [1 ]
机构
[1] Kuaishou Technol, Beijing, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
来源
PROCEEDINGS OF THE ACM WEB CONFERENCE 2025, WWW 2025 | 2025年
关键词
Multi-Task Fusion; Reinforcement Learning; Recommender System;
D O I
10.1145/3696410.3714959
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recommender systems need to optimize various types of user feedback, e.g., clicks, likes, and shares. A typical recommender system handling multiple types of feedback has two components: a multi-task learning (MTL) module, predicting feedback such as click-through rate and like rate; and a multi-task fusion (MTF) module, integrating these predictions into a single score for item ranking. MTF is essential for ensuring user satisfaction, as it directly influences recommendation outcomes. Recently, reinforcement learning (RL) has been applied to MTF tasks to improve long-term user satisfaction. However, existing RL-based MTF methods are formula-based methods, which only adjust limited coefficients within pre-defined formulas. The pre-defined formulas restrict the RL search space and become a bottleneck for MTF. To overcome this, we propose a formula-free MTF framework. We demonstrate that any suitable fusion function can be expressed as a composition of single-variable monotonic functions, as per the Sprecher Representation Theorem. Leveraging this, we introduce a novel learnable monotonic fusion cell (MFC) to replace pre-defined formulas. We call this new MFC-based model eXtreme MTF (xMTF). Furthermore, we employ a two-stage hybrid (TSH) learning strategy to train xMTF effectively. By expanding the MTF search space, xMTF outperforms existing methods in extensive offline and online experiments.
引用
收藏
页码:3840 / 3849
页数:10
相关论文
共 37 条
[1]   Reinforcing User Retention in a Billion Scale Short Video Recommender System [J].
Cai, Qingpeng ;
Liu, Shuchang ;
Wang, Xueliang ;
Zuo, Tianyou ;
Xie, Wentao ;
Yang, Bin ;
Zheng, Dong ;
Jiang, Peng ;
Gai, Kun .
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, :421-426
[2]  
Cai Qingpeng, 2023, P ACM WEB C 2023 WWW, P865
[3]   Top-K Off-Policy Correction for a REINFORCE Recommender System [J].
Chen, Minmin ;
Beutel, Alex ;
Covington, Paul ;
Jain, Sagar ;
Belletti, Francois ;
Chi, Ed H. .
PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :456-464
[4]  
Chen Xiaoshuang, 2024, WWW '24: Companion Proceedings of the ACM on Web Conference 2024, P284, DOI 10.1145/3589335.3648326
[5]  
Chen Z, 2018, PR MACH LEARN RES, V80
[6]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[7]   KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos [J].
Gao, Chongming ;
Li, Shijun ;
Zhang, Yuan ;
Chen, Jiawei ;
Li, Biao ;
Lei, Wenqiang ;
Jiang, Peng ;
He, Xiangnan .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, :3953-3957
[8]   Hierarchical User Profiling for E-commerce Recommender Systems [J].
Gu, Yulong ;
Ding, Zhuoye ;
Wang, Shuaiqiang ;
Yin, Dawei .
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, :223-231
[9]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[10]   Multi-Objective Recommendation via Multivariate Policy Learning [J].
Jeunen, Olivier ;
Mandav, Jatin ;
Potapov, Ivan ;
Agarwal, Nakul ;
Vaid, Sourabh ;
Shi, Wenzhe ;
Ustimenko, Aleksei .
PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, :712-721