Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework

被引：21

作者：

Wei, Xuekai ^{[1
]}

Zhou, Mingliang ^{[2
,3
]}

Kwong, Sam ^{[1
,4
]}

Yuan, Hui ^{[5
]}

Wang, Shiqi ^{[1
]}

Zhu, Guopu ^{[6
]}

Cao, Jingchao ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong 999077, Peoples R China

[2] Chongqing Univ, Sch Comp Sci, Chongqing 400044, Peoples R China

[3] Univ Macau, State Key Lab Internet Things Smart City, Taipa 999078, Macao, Peoples R China

[4] City Univ Hong Kong, Shenzhen Res Inst, Shenzhen 518057, Peoples R China

[5] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Peoples R China

[6] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

来源：

INFORMATION SCIENCES | 2021年 / 569卷

关键词：

MPEG-DASH; Quality of experience; Machine learning; Reinforcement learning; DASH; TIME; ADAPTATION; ALGORITHM; MODEL;

D O I：

10.1016/j.ins.2021.05.012

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dynamic adaptive streaming over the HTTP (DASH) standard has been widely adopted by many content providers for online video transmission and greatly improve the performance. Designing an efficient DASH system is challenging because of the inherent large fluctuations characterizing both encoded video sequences and network traces. In this paper, a reinforcement learning (RL)-based DASH technique that addresses user quality of experience (QoE) is constructed. The DASH adaptive bitrate (ABR) selection problem is formulated as a Markov decision process (MDP) problem. Accordingly, an RL-based solution is proposed to solve the MDP problem, in which the DASH clients act as the RL agent, and the network variation constitutes the environment. The proposed user QoE is used as the reward by jointly considering the video quality and buffer status. The goal of the RL algorithm is to select a suitable video quality level for each video segment to maximize the total reward. Then, the proposed RL-based ABR algorithm is embedded in the QoEoriented DASH framework. Experimental results show that the proposed RL-based ABR algorithm outperforms state-of-the-art schemes in terms of both temporal and visual QoE factors by a noticeable margin while guaranteeing application-level fairness when multiple clients share a bottlenecked network. (c) 2021 Elsevier Inc. All rights reserved.

引用

页码：786 / 803

页数：18

共 48 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2] DASH Adaptation Algorithm Based on Adaptive Forgetting Factor Estimation [J].

Aguayo, Miguel ;

Bellido, Luis ;

Lentisco, Carlos M. ;

Pastor, Encarna .

IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (05) :1224-1232

[3]

[Anonymous], 2020, INT J GEOGR INF SCI, DOI DOI 10.1080/13658816.2018.1554812

[4] In-Network Quality Optimization for Adaptive Video Streaming Services [J].

Bouten, Niels ;

Latre, Steven ;

Famaey, Jeroen ;

Van Leekwijck, Werner ;

De Turck, Filip .

IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) :2281-2293

[5] LPG-model: A novel model for throughput prediction in stream processing, using a light gradient boosting machine, incremental principal component analysis, and deep gated recurrent unit network [J].

Chu, Zheng ;

Yu, Jiong ;

Hamdulla, Askar .

INFORMATION SCIENCES, 2020, 535 :107-129

[6]

DASH-IF, 2019, DASH IND FOR CAT AD

[7]

DASH-IF:dash.js, 2019, REF CLIENT IMPL PLAY

[8]

De Vriendt J, 2013, 2013 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2013), P1288

[9]

F.C.C, 2016, RAW DAT MEAS BROADB

[10] STEADY-STATE BEHAVIOR OF KALMAN FILTER WITH DISCRETE-TIME AND CONTINUOUS-TIME OBSERVATIONS [J].

FRIEDLAND, B .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1980, 25 (05) :988-992

← 1 2 3 4 5 →