Learning Tailored Adaptive Bitrate Algorithms to Heterogeneous Network Conditions: A Domain-Specific Priors and Meta-Reinforcement Learning Approach

被引:32
作者
Huang, Tianchi [1 ]
Zhou, Chao [2 ]
Zhang, Rui-Xiao [3 ]
Wu, Chenglei [3 ]
Sun, Lifeng [1 ,3 ,4 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing Key Lab Networked Multimedia, Beijing 100084, Peoples R China
[2] Beijing Kuaishou Technol Co Ltd, Beijing 100085, Peoples R China
[3] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRis, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[4] Tsinghua Univ, Key Lab Pervas Comp, Minist Educ, Beijing 100084, Peoples R China
关键词
Streaming media; reinforcement learning (RL); adaptive control;
D O I
10.1109/JSAC.2022.3180804
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Internet adaptive video streaming is a typical form of video delivery that leverages adaptive bitrate (ABR) algorithms to provide video services with high quality of experience (QoE) for various users in diverse and unique network conditions. Such heterogeneous network environments, which can be viewed as exogenous input processes, often lead to the unstable performance of ABR algorithms. Unfortunately, learning-based ABR algorithm which generated by state-of-the-art reinforcement learning (RL) technologies achieves good average performance but fails to perform well in all kinds of network conditions. In this work, considering the video playback process as the Input-driven Markov Decision Process (IMDP), we propose A(2)BR (Adaptation of ABR), a novel meta-RL ABR approach. A(2)BR is mainly composed of an online stage and an offline stage. It leverages meta-RL to learn an initial meta-policy with various network conditions at the offline stage and makes decisions in personalized network conditions at the online stage. At the same time, we continually optimize the meta-policy to the tailormade ABR policy for varying the current network environment within few shots. Moreover, in order to improve the learning efficiency, we fully utilize domain knowledge for implementing a virtual player to replay the previously experienced network. Using trace-driven experiments on various scenarios including different vehicles, users, network types, and heterogeneous user-preferences, we show that A(2)BR outperforming recent ABR approaches with rapidly adapting to the personalized QoE metrics and specific network conditions. Testbed experimental results also illustrate the superiority of A(2)BR in adapting to the unseen environments.
引用
收藏
页码:2485 / 2503
页数:19
相关论文
共 90 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Classic Meets Modern: a Pragmatic Learning-Based Congestion Control for the Internet [J].
Abbasloo, Soheil ;
Yen, Chen-Yu ;
Chao, H. Jonathan .
SIGCOMM '20: PROCEEDINGS OF THE 2020 ANNUAL CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION ON THE APPLICATIONS, TECHNOLOGIES, ARCHITECTURES, AND PROTOCOLS FOR COMPUTER COMMUNICATION, 2020, :632-647
[3]  
Agarwal Alekh, 2021, JOURNAL OF MACHINE LEARNING RESEARCH, V22
[4]   Oboe: Auto-tuning Video ABR Algorithms to Network Conditions [J].
Akhtar, Zahaib ;
Nam, Yun Seong ;
Govindan, Ramesh ;
Rao, Sanjay ;
Chen, Jessica ;
Katz-Bassett, Ethan ;
Ribeiro, Bruno ;
Zhan, Jibin ;
Zhang, Hui .
PROCEEDINGS OF THE 2018 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION (SIGCOMM '18), 2018, :44-58
[5]  
[Anonymous], 2019, Mesos
[6]   Data-Driven Bandwidth Prediction Models and Automated Model Selection for Low Latency [J].
Bentaleb, Abdelhak ;
Begen, Ali C. ;
Harous, Saad ;
Zimmermann, Roger .
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :2588-2601
[7]  
Bentaleb A, 2020, ACM T MULTIM COMPUT, V16, DOI 10.1145/3387921
[8]   A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP [J].
Bentaleb, Abdelhak ;
Taani, Bayan ;
Begen, Ali C. ;
Timmerer, Christian ;
Zimmermann, Roger .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2019, 21 (01) :562-585
[9]  
Biswas A., 2018, P NIPS
[10]  
Brockman G, 2016, Arxiv, DOI arXiv:1606.01540