MANSY: Generalizing Neural Adaptive Immersive Video Streaming With Ensemble and Representation Learning

被引：1

作者：

Wu, Duo ^{[1
,2
,3
]}

Wu, Panlong ^{[1
,2
]}

Zhang, Miao ^{[4
]}

Wang, Fangxin ^{[5
,6
]}

机构：

[1] Chinese Univ Hong Kong, Shenzhen Future Network Intelligence Inst FNii She, Shenzhen 518172, Peoples R China

[2] Chinese Univ Hong Kong, Sch Sci & Engn SSE, Shenzhen 518172, Peoples R China

[3] Tsinghua Univ, Shenzhen Int Grad Sch, Beijing 100190, Peoples R China

[4] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada

[5] Chinese Univ Hong Kong, Shenzhen Future Network Intelligence Inst FNii She, Sch Sci & Engn SSE, Shenzhen 518172, Peoples R China

[6] Chinese Univ Hong Kong, Guangdong Prov Key Lab Future Networks Intelligenc, Shenzhen 518172, Peoples R China

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2025年 / 24卷 / 03期

关键词：

Quality of experience; Predictive models; Bit rate; Streaming media; Training; Accuracy; Computational modeling; Solid modeling; Mobile computing; Representation learning; Tile-based neural adaptive immersive video streaming; generalization; ensemble learning; representation learning;

D O I：

10.1109/TMC.2024.3487175

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The popularity of immersive videos has prompted extensive research into neural adaptive tile-based streaming to optimize video transmission over networks with limited bandwidth. However, the diversity of users' viewing patterns and Quality of Experience (QoE) preferences has not been fully addressed yet by existing neural adaptive approaches for viewport prediction and bitrate selection. Their performance can significantly deteriorate when users' actual viewing patterns and QoE preferences differ considerably from those observed during the training phase, resulting in poor generalization. In this paper, we propose MANSY, a novel streaming system that embraces user diversity to improve generalization. Specifically, to accommodate users' diverse viewing patterns, we design a Transformer-based viewport prediction model with an efficient multi-viewport trajectory input output architecture based on implicit ensemble learning. Besides, we for the first time combine the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences. Extensive experiments demonstrate that MANSY outperforms state-of-the-art approaches in viewport prediction accuracy and QoE improvement on both trained and unseen viewing patterns and QoE preferences, achieving better generalization.

引用

页码：1654 / 1668

页数：15

共 46 条

[1] Alsop T., 2022, VR headset unit sales worldwide 2019-2024
[2] A Saliency Dataset for 360-Degree Videos
Anh Nguyen
Yan, Zhisheng
[J]. PROCEEDINGS OF THE 10TH ACM MULTIMEDIA SYSTEMS CONFERENCE (ACM MMSYS'19), 2019, : 279 - 284
[3] [Anonymous], [1] International Energy Agency. Available online: https://www.iea.org/ (accessed on 11 May 2018).
[4] Belghazi MI, 2018, PR MACH LEARN RES, V80
[5] Representation Learning: A Review and New Perspectives
Bengio, Yoshua
Courville, Aaron
Vincent, Pascal
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1798 - 1828
[6] Chen JG, 2023, INT ARCH PHOTOGRAMM, P85, DOI [10.5194/isprs-archives-XLVIII-1-W2-2023-85-2023, 10.1109/TBC.2023.3234405]
[7] Chen X, 2016, ADV NEUR IN, V29
[8] PARIMA: Viewport Adaptive 360-Degree Video Streaming
Chopra, Lovish
Chakraborty, Sarthak
Mondal, Abhijit
Chakraborty, Sandip
[J]. PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 2379 - 2391
[9] Hjelm RD, 2019, Arxiv, DOI [arXiv:1808.06670, 10.48550/arXiv.1808.06670, DOI 10.48550/ARXIV.1808.06670]
[10] Ensemble deep learning: A review
Ganaie, M. A.
Hu, Minghui
Malik, A. K.
Tanveer, M.
Suganthan, P. N.
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115

← 1 2 3 4 5 →