Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services

被引:15
作者
Huang, Tianchi [1 ,2 ]
Zhang, Rui-Xiao [1 ]
Sun, Lifeng [1 ,2 ,3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing Key Lab Networked Multimedia, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 10084, Peoples R China
[3] Tsinghua Univ, Minist Educ, Key Lab Pervas Comp, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Video transmission; self-play; reinforcement learning;
D O I
10.1109/TMM.2021.3063620
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function often fails to describe the requirement accurately, resulting in the violation of generating the required methods. We propose Zwei, a self-play reinforcement learning framework that updates the policy by straightforwardly utilizing the actual requirement. Technically, Zwei effectively rolls out the trajectories from the same initial state, and instantly estimate the win rate w.r.t the competition outcome, where the outcome represents which trajectory is closer to the assigned requirement. We evaluate Zwei with different requirements on various video transmission tasks, including adaptive bitrate streaming, crowd-sourced live streaming scheduling, and real-time communication. Results indicate that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios. Moreover, we further propose Zwei(+), which enables Zwei to learn the policies in the vanilla no-regret reinforcement learning scenario. We validate Zwei(+) in the adaptive Nitrate streaming task and show the superiority of the proposed method over existing state-of-the-art approaches.
引用
收藏
页码:1350 / 1365
页数:16
相关论文
共 51 条
  • [31] Neural Adaptive Video Streaming with Pensieve
    Mao, Hongzi
    Netravali, Ravi
    Alizadeh, Mohammad
    [J]. SIGCOMM '17: PROCEEDINGS OF THE 2017 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2017, : 197 - 210
  • [32] Mnih V, 2016, PR MACH LEARN RES, V48
  • [33] Analysis and comparison of TCP Reno and Vegas
    Mo, J
    La, RJ
    Anantharam, V
    Walrand, J
    [J]. IEEE INFOCOM '99 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS: THE FUTURE IS NOW, 1999, : 1556 - 1563
  • [34] Rassool R, 2017, IEEE INT SYM BROADB, P351
  • [35] Riiser H., 2013, SER MMSYS 13, P114
  • [36] Rossi Dario, 2010, ICCCN, P1
  • [37] Sato N, 2017, IEEE SYMP COMP COMMU, P339, DOI 10.1109/ISCC.2017.8024553
  • [38] Schulman John, 2017, Proximal policy optimization algorithms
  • [39] Mastering the game of Go with deep neural networks and tree search
    Silver, David
    Huang, Aja
    Maddison, Chris J.
    Guez, Arthur
    Sifre, Laurent
    van den Driessche, George
    Schrittwieser, Julian
    Antonoglou, Ioannis
    Panneershelvam, Veda
    Lanctot, Marc
    Dieleman, Sander
    Grewe, Dominik
    Nham, John
    Kalchbrenner, Nal
    Sutskever, Ilya
    Lillicrap, Timothy
    Leach, Madeleine
    Kavukcuoglu, Koray
    Graepel, Thore
    Hassabis, Demis
    [J]. NATURE, 2016, 529 (7587) : 484 - +
  • [40] From Theory to Practice: Improving Bitrate Adaptation in the DASH Reference Player
    Spiteri, Kevin
    Sitaraman, Ramesh
    Sparacio, Daniel
    [J]. PROCEEDINGS OF THE 9TH ACM MULTIMEDIA SYSTEMS CONFERENCE (MMSYS'18), 2018, : 123 - 137