TCLiVi: Transmission Control in Live Video Streaming Based on Deep Reinforcement Learning

被引:37
作者
Cui, Laizhong [1 ,2 ]
Su, Dongyuan [1 ,2 ]
Yang, Shu [1 ,2 ]
Wang, Zhi [3 ]
Ming, Zhong [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Shenzhen Univ, Guangdong Lab Artificial Intelligence & Cyber Eco, Shenzhen 518060, Peoples R China
[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Live video streaming; reinforcement learning; joint optimization; adaptive transmission control; ADAPTATION; DASH;
D O I
10.1109/TMM.2020.2985631
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, video content accounts for the majority of network traffic. With increased live streaming, rigorous requirements have been introduced for better Quality of Experience (QoE). It is challenging to meet satisfactory QoE in live streaming, where the aim is to achieve a balance between 1) enhancing the video quality and stability and 2) reducing the rebuffering time and end-to-end delay, under different scenarios with various network conditions and user preferences, where the fluctuation in the network throughput degrades the QoE severely. In this paper, we propose an approach to improve the QoE for live video streaming based on Deep Reinforcement Learning (DRL). The new approach jointly adjusts the streaming parameters, including the video bitrate and target buffer size. With the basic DRL framework, TCLiVi can automatically generate the inference model based on the playback information, to achieve the joint optimization of the video quality, stability, rebuffering time and latency parameters. We evaluate our framework on real-world data in different live streaming broadcast scenarios, such as a talent show and a sports competition under different network conditions. We compare TCLiVi with other algorithms, such as the Double DQN, MPC and Buffer-based algorithms. The simulation results show that TCLiVi significantly improves the video quality and decreases the rebuffering time, consequently increasing the QoE score by 40.84% in average. We also show that TCLiVi is self-adaptive in different scenarios.
引用
收藏
页码:651 / 663
页数:13
相关论文
共 49 条
[1]   DASH Adaptation Algorithm Based on Adaptive Forgetting Factor Estimation [J].
Aguayo, Miguel ;
Bellido, Luis ;
Lentisco, Carlos M. ;
Pastor, Encarna .
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (05) :1224-1232
[2]  
[Anonymous], 1989, LEARNING DELAYED REW
[3]  
[Anonymous], 2011, P 2 ANN ACM C MULT S
[4]  
[Anonymous], 2012, Neural Networks Machine Learn.
[5]  
[Anonymous], 2019, GLOBAL TRANSMISSION
[6]  
[Anonymous], 2018, 2018 GLOBAL INTERNET
[7]   Developing a Predictive Model of Quality of Experience for Internet Video [J].
Balachandran, Athula ;
Sekar, Vyas ;
Akella, Aditya ;
Seshan, Srinivasan ;
Stoica, Ion ;
Zhang, Hui .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2013, 43 (04) :339-350
[8]   What drives live-stream usage intention? The perspectives of flow, entertainment, social interaction, and endorsement [J].
Chen, Chia-Chen ;
Lin, Yi-Chen .
TELEMATICS AND INFORMATICS, 2018, 35 (01) :293-303
[9]   SATE: Providing Stable and Agile Adaptation in HTTP-Based Video Streaming [J].
Choi, Wangyu ;
Moon, Jongwon .
IEEE ACCESS, 2019, 7 :26830-26841
[10]  
De Asis K, 2018, AAAI CONF ARTIF INTE, P2902