Zwei: A Self-Play Reinforcement Learning Framework for Video Transmission Services

被引：15

作者：

Huang, Tianchi ^{[1
,2
]}

Zhang, Rui-Xiao ^{[1
]}

Sun, Lifeng ^{[1
,2
,3
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing Key Lab Networked Multimedia, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, BNRist, Beijing 10084, Peoples R China

[3] Tsinghua Univ, Minist Educ, Key Lab Pervas Comp, Beijing, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2022年 / 24卷

基金：

国家重点研发计划;

关键词：

Video transmission; self-play; reinforcement learning;

D O I：

10.1109/TMM.2021.3063620

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function often fails to describe the requirement accurately, resulting in the violation of generating the required methods. We propose Zwei, a self-play reinforcement learning framework that updates the policy by straightforwardly utilizing the actual requirement. Technically, Zwei effectively rolls out the trajectories from the same initial state, and instantly estimate the win rate w.r.t the competition outcome, where the outcome represents which trajectory is closer to the assigned requirement. We evaluate Zwei with different requirements on various video transmission tasks, including adaptive bitrate streaming, crowd-sourced live streaming scheduling, and real-time communication. Results indicate that Zwei optimizes itself according to the assigned requirement faithfully, outperforming the state-of-the-art methods under all considered scenarios. Moreover, we further propose Zwei(+), which enables Zwei to learn the policies in the vanilla no-regret reinforcement learning scenario. We validate Zwei(+) in the adaptive Nitrate streaming task and show the superiority of the proposed method over existing state-of-the-art approaches.

引用

页码：1350 / 1365

页数：16

共 51 条

[1] Adhikari VK, 2012, IEEE INFOCOM SER, P1620, DOI 10.1109/INFCOM.2012.6195531
[2] Oboe: Auto-tuning Video ABR Algorithms to Network Conditions
Akhtar, Zahaib
Nam, Yun Seong
Govindan, Ramesh
Rao, Sanjay
Chen, Jessica
Katz-Bassett, Ethan
Ribeiro, Bruno
Zhan, Jibin
Zhang, Hui
[J]. PROCEEDINGS OF THE 2018 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION (SIGCOMM '18), 2018, : 44 - 58
[3] [Anonymous], 2017, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2016-2021 White Paper
[4] [Anonymous], 2018, ARXIV181106166
[5] Balduzzi D, 2019, PR MACH LEARN RES, V97
[6] Data-Driven Bandwidth Prediction Models and Automated Model Selection for Low Latency
Bentaleb, Abdelhak
Begen, Ali C.
Harous, Saad
Zimmermann, Roger
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2588 - 2601
[7] A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP
Bentaleb, Abdelhak
Taani, Bayan
Begen, Ali C.
Timmerer, Christian
Zimmermann, Roger
[J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2019, 21 (01): : 562 - 585
[8] Analysis and Design of the Google Congestion Control for Web Real-time Communication (WebRTC)
Carlucci, Gaetano
De Cicco, Luca
Holmer, Stefan
Mascolo, Saverio
[J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA SYSTEMS (MMSYS'16), 2016, : 133 - 144
[9] Chandra R., 2001, PARALLEL PROGRAMMING
[10] Coulom R, 2008, LECT NOTES COMPUT SC, V5131, P113, DOI 10.1007/978-3-540-87608-3_11

← 1 2 3 4 5 6 →