Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services

被引:15
作者
Huang, Tianchi [1 ,2 ]
Zhang, Rui-Xiao [1 ]
Sun, Lifeng [1 ,2 ,3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing Key Lab Networked Multimedia, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 10084, Peoples R China
[3] Tsinghua Univ, Minist Educ, Key Lab Pervas Comp, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Video transmission; self-play; reinforcement learning;
D O I
10.1109/TMM.2021.3063620
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function often fails to describe the requirement accurately, resulting in the violation of generating the required methods. We propose Zwei, a self-play reinforcement learning framework that updates the policy by straightforwardly utilizing the actual requirement. Technically, Zwei effectively rolls out the trajectories from the same initial state, and instantly estimate the win rate w.r.t the competition outcome, where the outcome represents which trajectory is closer to the assigned requirement. We evaluate Zwei with different requirements on various video transmission tasks, including adaptive bitrate streaming, crowd-sourced live streaming scheduling, and real-time communication. Results indicate that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios. Moreover, we further propose Zwei(+), which enables Zwei to learn the policies in the vanilla no-regret reinforcement learning scenario. We validate Zwei(+) in the adaptive Nitrate streaming task and show the superiority of the proposed method over existing state-of-the-art approaches.
引用
收藏
页码:1350 / 1365
页数:16
相关论文
共 51 条
  • [1] Adhikari VK, 2012, IEEE INFOCOM SER, P1620, DOI 10.1109/INFCOM.2012.6195531
  • [2] Oboe: Auto-tuning Video ABR Algorithms to Network Conditions
    Akhtar, Zahaib
    Nam, Yun Seong
    Govindan, Ramesh
    Rao, Sanjay
    Chen, Jessica
    Katz-Bassett, Ethan
    Ribeiro, Bruno
    Zhan, Jibin
    Zhang, Hui
    [J]. PROCEEDINGS OF THE 2018 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION (SIGCOMM '18), 2018, : 44 - 58
  • [3] [Anonymous], 2017, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016-2021 White Paper
  • [4] [Anonymous], 2018, ARXIV181106166
  • [5] Balduzzi D, 2019, PR MACH LEARN RES, V97
  • [6] Data-Driven Bandwidth Prediction Models and Automated Model Selection for Low Latency
    Bentaleb, Abdelhak
    Begen, Ali C.
    Harous, Saad
    Zimmermann, Roger
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2588 - 2601
  • [7] A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP
    Bentaleb, Abdelhak
    Taani, Bayan
    Begen, Ali C.
    Timmerer, Christian
    Zimmermann, Roger
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2019, 21 (01): : 562 - 585
  • [8] Analysis and Design of the Google Congestion Control for Web Real-time Communication (WebRTC)
    Carlucci, Gaetano
    De Cicco, Luca
    Holmer, Stefan
    Mascolo, Saverio
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA SYSTEMS (MMSYS'16), 2016, : 133 - 144
  • [9] Chandra R., 2001, PARALLEL PROGRAMMING
  • [10] Coulom R, 2008, LECT NOTES COMPUT SC, V5131, P113, DOI 10.1007/978-3-540-87608-3_11