DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction

被引：53

作者：

Zhang, Huaizheng ^{[1
]}

Dong, Linsen ^{[1
]}

Gao, Guanyu ^{[2
]}

Hu, Han ^{[3
]}

Wen, Yonggang ^{[1
]}

Guan, Kyle ^{[4
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Nanyang Ave, Singapore 639798, Singapore

[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China

[3] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China

[4] Nokia Bell Labs, Holmdel, NJ 07733 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 12期

基金：

中国国家自然科学基金; 新加坡国家研究基金会;

关键词：

Quality of experience; Feature extraction; Task analysis; Streaming media; Machine learning; Predictive models; Measurement; Video quality of experience; deep learning; feature; representation; adaptive video streaming; MODELS;

D O I：

10.1109/TMM.2020.2973828

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, many models have been developed to predict video Quality of Experience (QoE), yet the applicability of these models still faces significant challenges. Firstly, many models rely on features that are unique to a specific dataset and thus lack the capability to generalize. Due to the intricate interactions among these features, a unified representation that is independent of datasets with different modalities is needed. Secondly, existing models often lack the configurability to perform both classification and regression tasks. Thirdly, the sample size of the available datasets to develop these models is often very small, and the impact of limited data on the performance of QoE models has not been adequately addressed. To address these issues, in this work we develop a novel and end-to-end framework termed as DeepQoE. The proposed framework first uses a combination of deep learning techniques, such as word embedding and 3D convolutional neural network (C3D), to extract generalized features. Next, these features are combined and fed into a neural network for representation learning. A learned representation will then serve as input for classification or regression tasks. We evaluate the performance of DeepQoE with three datasets. The results show that for small datasets (e.g., WHU-MVQoE2016 and Live-Netflix Video Database), the performance of state-of-the-art machine learning algorithms is greatly improved by using the QoE representation from DeepQoE (e.g., 35.71% to 44.82%); while for the large dataset (e.g., VideoSet), our DeepQoE framework achieves significant performance improvement in comparison to the best baseline method (90.94% vs. 82.84%). In addition to the much improved performance, DeepQoE has the flexibility to fit different datasets, to learn QoE representation, and to perform both classification and regression problems. We also develop a DeepQoE based adaptive bitrate streaming (ABR) system to verify that our framework can be easily applied to multimedia communication service. The software package of the DeepQoE framework has been released to facilitate the current research on QoE.

引用

页码：3210 / 3223

页数：14

共 63 条

[1]

Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/2951913.2976746, 10.1145/3022670.2976746]

[2]

[Anonymous], 2016, PROC INT C LEARNING

[3]

[Anonymous], 2014, 2014 Montreal

[4]

[Anonymous], 1999, Subjective Video Quality Assessment Methods for Multimedia Applications

[5] Developing a Predictive Model of Quality of Experience for Internet Video [J].

Balachandran, Athula ;

Sekar, Vyas ;

Akella, Aditya ;

Seshan, Srinivasan ;

Stoica, Ion ;

Zhang, Hui .

ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2013, 43 (04) :339-350

[6]

Bampis C. G., 2017, ARXIV170300633

[7] Recurrent and Dynamic Models for Predicting Streaming Video Quality of Experience [J].

Bampis, Christos G. ;

Li, Zhi ;

Katsavounidis, Ioannis ;

Bovik, Alan C. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) :3316-3331

[8] Study of Temporal Effects on Subjective Video Quality of Experience [J].

Bampis, Christos George ;

Li, Zhi ;

Moorthy, Anush Krishna ;

Katsavounidis, Ioannis ;

Aaron, Anne ;

Bovik, Alan Conrad .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (11) :5217-5231

[9]

Bohez S, 2017, IEEE INT C INT ROBOT, P2365, DOI 10.1109/IROS.2017.8206048

[10] Automatic Prediction of Perceptual Image and Video Quality [J].

Bovik, Alan Conrad .

PROCEEDINGS OF THE IEEE, 2013, 101 (09) :2008-2024

← 1 2 3 4 5 6 7 →