xMTF: A Formula-Free Model for Reinforcement-Learning-Based Multi-Task Fusion in Recommender Systems

被引：0

作者：

Cao, Yang ^{[1
]}

Zhang, Changhao ^{[2
]}

Chen, Xiaoshuang ^{[1
]}

Zhan, Kaiqiao ^{[1
]}

Wang, Ben ^{[1
]}

机构：

[1] Kuaishou Technol, Beijing, Peoples R China

[2] Peking Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE ACM WEB CONFERENCE 2025, WWW 2025 | 2025年

关键词：

Multi-Task Fusion; Reinforcement Learning; Recommender System;

D O I：

10.1145/3696410.3714959

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Recommender systems need to optimize various types of user feedback, e.g., clicks, likes, and shares. A typical recommender system handling multiple types of feedback has two components: a multi-task learning (MTL) module, predicting feedback such as click-through rate and like rate; and a multi-task fusion (MTF) module, integrating these predictions into a single score for item ranking. MTF is essential for ensuring user satisfaction, as it directly influences recommendation outcomes. Recently, reinforcement learning (RL) has been applied to MTF tasks to improve long-term user satisfaction. However, existing RL-based MTF methods are formula-based methods, which only adjust limited coefficients within pre-defined formulas. The pre-defined formulas restrict the RL search space and become a bottleneck for MTF. To overcome this, we propose a formula-free MTF framework. We demonstrate that any suitable fusion function can be expressed as a composition of single-variable monotonic functions, as per the Sprecher Representation Theorem. Leveraging this, we introduce a novel learnable monotonic fusion cell (MFC) to replace pre-defined formulas. We call this new MFC-based model eXtreme MTF (xMTF). Furthermore, we employ a two-stage hybrid (TSH) learning strategy to train xMTF effectively. By expanding the MTF search space, xMTF outperforms existing methods in extensive offline and online experiments.

引用

页码：3840 / 3849

页数：10

共 37 条

[1] Reinforcing User Retention in a Billion Scale Short Video Recommender System [J].

Cai, Qingpeng ;

Liu, Shuchang ;

Wang, Xueliang ;

Zuo, Tianyou ;

Xie, Wentao ;

Yang, Bin ;

Zheng, Dong ;

Jiang, Peng ;

Gai, Kun .

COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, :421-426

[2]

Cai Qingpeng, 2023, P ACM WEB C 2023 WWW, P865

[3] Top-K Off-Policy Correction for a REINFORCE Recommender System [J].

Chen, Minmin ;

Beutel, Alex ;

Covington, Paul ;

Jain, Sagar ;

Belletti, Francois ;

Chi, Ed H. .

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :456-464

[4]

Chen Xiaoshuang, 2024, WWW '24: Companion Proceedings of the ACM on Web Conference 2024, P284, DOI 10.1145/3589335.3648326

[5]

Chen Z, 2018, PR MACH LEARN RES, V80

[6]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[7] KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos [J].

Gao, Chongming ;

Li, Shijun ;

Zhang, Yuan ;

Chen, Jiawei ;

Li, Biao ;

Lei, Wenqiang ;

Jiang, Peng ;

He, Xiangnan .

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, :3953-3957

[8] Hierarchical User Profiling for E-commerce Recommender Systems [J].

Gu, Yulong ;

Ding, Zhuoye ;

Wang, Shuaiqiang ;

Yin, Dawei .

PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, :223-231

[9]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[10] Multi-Objective Recommendation via Multivariate Policy Learning [J].

Jeunen, Olivier ;

Mandav, Jatin ;

Potapov, Ivan ;

Agarwal, Nakul ;

Vaid, Sourabh ;

Shi, Wenzhe ;

Ustimenko, Aleksei .

PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, :712-721

← 1 2 3 4 →