Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services

被引：15

作者：

Huang, Tianchi ^{[1
,2
]}

Zhang, Rui-Xiao ^{[1
]}

Sun, Lifeng ^{[1
,2
,3
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing Key Lab Networked Multimedia, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 10084, Peoples R China

[3] Tsinghua Univ, Minist Educ, Key Lab Pervas Comp, Beijing, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

基金：

国家重点研发计划;

关键词：

Video transmission; self-play; reinforcement learning;

D O I：

10.1109/TMM.2021.3063620

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function often fails to describe the requirement accurately, resulting in the violation of generating the required methods. We propose Zwei, a self-play reinforcement learning framework that updates the policy by straightforwardly utilizing the actual requirement. Technically, Zwei effectively rolls out the trajectories from the same initial state, and instantly estimate the win rate w.r.t the competition outcome, where the outcome represents which trajectory is closer to the assigned requirement. We evaluate Zwei with different requirements on various video transmission tasks, including adaptive bitrate streaming, crowd-sourced live streaming scheduling, and real-time communication. Results indicate that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios. Moreover, we further propose Zwei(+), which enables Zwei to learn the policies in the vanilla no-regret reinforcement learning scenario. We validate Zwei(+) in the adaptive Nitrate streaming task and show the superiority of the proposed method over existing state-of-the-art approaches.

引用

页码：1350 / 1365

页数：16

共 51 条

[31] Neural Adaptive Video Streaming with Pensieve
Mao, Hongzi
Netravali, Ravi
Alizadeh, Mohammad
[J]. SIGCOMM '17: PROCEEDINGS OF THE 2017 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2017, : 197 - 210
[32] Mnih V, 2016, PR MACH LEARN RES, V48
[33] Analysis and comparison of TCP Reno and Vegas
Mo, J
La, RJ
Anantharam, V
Walrand, J
[J]. IEEE INFOCOM '99 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS: THE FUTURE IS NOW, 1999, : 1556 - 1563
[34] Rassool R, 2017, IEEE INT SYM BROADB, P351
[35] Riiser H., 2013, SER MMSYS 13, P114
[36] Rossi Dario, 2010, ICCCN, P1
[37] Sato N, 2017, IEEE SYMP COMP COMMU, P339, DOI 10.1109/ISCC.2017.8024553
[38] Schulman John, 2017, Proximal policy optimization algorithms
[39] Mastering the game of Go with deep neural networks and tree search
Silver, David
Huang, Aja
Maddison, Chris J.
Guez, Arthur
Sifre, Laurent
van den Driessche, George
Schrittwieser, Julian
Antonoglou, Ioannis
Panneershelvam, Veda
Lanctot, Marc
Dieleman, Sander
Grewe, Dominik
Nham, John
Kalchbrenner, Nal
Sutskever, Ilya
Lillicrap, Timothy
Leach, Madeleine
Kavukcuoglu, Koray
Graepel, Thore
Hassabis, Demis
[J]. NATURE, 2016, 529 (7587) : 484 - +
[40] From Theory to Practice: Improving Bitrate Adaptation in the DASH Reference Player
Spiteri, Kevin
Sitaraman, Ramesh
Sparacio, Daniel
[J]. PROCEEDINGS OF THE 9TH ACM MULTIMEDIA SYSTEMS CONFERENCE (MMSYS'18), 2018, : 123 - 137

← 1 2 3 4 5 6 →