MANSY: Generalizing Neural Adaptive Immersive Video Streaming With Ensemble and Representation Learning

被引:1
作者
Wu, Duo [1 ,2 ,3 ]
Wu, Panlong [1 ,2 ]
Zhang, Miao [4 ]
Wang, Fangxin [5 ,6 ]
机构
[1] Chinese Univ Hong Kong, Shenzhen Future Network Intelligence Inst FNii She, Shenzhen 518172, Peoples R China
[2] Chinese Univ Hong Kong, Sch Sci & Engn SSE, Shenzhen 518172, Peoples R China
[3] Tsinghua Univ, Shenzhen Int Grad Sch, Beijing 100190, Peoples R China
[4] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[5] Chinese Univ Hong Kong, Shenzhen Future Network Intelligence Inst FNii She, Sch Sci & Engn SSE, Shenzhen 518172, Peoples R China
[6] Chinese Univ Hong Kong, Guangdong Prov Key Lab Future Networks Intelligenc, Shenzhen 518172, Peoples R China
关键词
Quality of experience; Predictive models; Bit rate; Streaming media; Training; Accuracy; Computational modeling; Solid modeling; Mobile computing; Representation learning; Tile-based neural adaptive immersive video streaming; generalization; ensemble learning; representation learning;
D O I
10.1109/TMC.2024.3487175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The popularity of immersive videos has prompted extensive research into neural adaptive tile-based streaming to optimize video transmission over networks with limited bandwidth. However, the diversity of users' viewing patterns and Quality of Experience (QoE) preferences has not been fully addressed yet by existing neural adaptive approaches for viewport prediction and bitrate selection. Their performance can significantly deteriorate when users' actual viewing patterns and QoE preferences differ considerably from those observed during the training phase, resulting in poor generalization. In this paper, we propose MANSY, a novel streaming system that embraces user diversity to improve generalization. Specifically, to accommodate users' diverse viewing patterns, we design a Transformer-based viewport prediction model with an efficient multi-viewport trajectory input output architecture based on implicit ensemble learning. Besides, we for the first time combine the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences. Extensive experiments demonstrate that MANSY outperforms state-of-the-art approaches in viewport prediction accuracy and QoE improvement on both trained and unseen viewing patterns and QoE preferences, achieving better generalization.
引用
收藏
页码:1654 / 1668
页数:15
相关论文
共 46 条
  • [1] Alsop T., 2022, VR headset unit sales worldwide 2019-2024
  • [2] A Saliency Dataset for 360-Degree Videos
    Anh Nguyen
    Yan, Zhisheng
    [J]. PROCEEDINGS OF THE 10TH ACM MULTIMEDIA SYSTEMS CONFERENCE (ACM MMSYS'19), 2019, : 279 - 284
  • [3] [Anonymous], [1] International Energy Agency. Available online: https://www.iea.org/ (accessed on 11 May 2018).
  • [4] Belghazi MI, 2018, PR MACH LEARN RES, V80
  • [5] Representation Learning: A Review and New Perspectives
    Bengio, Yoshua
    Courville, Aaron
    Vincent, Pascal
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1798 - 1828
  • [6] Chen JG, 2023, INT ARCH PHOTOGRAMM, P85, DOI [10.5194/isprs-archives-XLVIII-1-W2-2023-85-2023, 10.1109/TBC.2023.3234405]
  • [7] Chen X, 2016, ADV NEUR IN, V29
  • [8] PARIMA: Viewport Adaptive 360-Degree Video Streaming
    Chopra, Lovish
    Chakraborty, Sarthak
    Mondal, Abhijit
    Chakraborty, Sandip
    [J]. PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 2379 - 2391
  • [9] Hjelm RD, 2019, Arxiv, DOI [arXiv:1808.06670, 10.48550/arXiv.1808.06670, DOI 10.48550/ARXIV.1808.06670]
  • [10] Ensemble deep learning: A review
    Ganaie, M. A.
    Hu, Minghui
    Malik, A. K.
    Tanveer, M.
    Suganthan, P. N.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115