Global and Compact Video Context Embedding for Video Semantic Segmentation

被引:0
|
作者
Sun, Lei [1 ,2 ]
Liu, Yun [3 ]
Sun, Guolei [2 ]
Wu, Min [3 ]
Xu, Zhijie [4 ]
Wang, Kaiwei [1 ]
Van Gool, Luc [2 ]
机构
[1] Zhejiang Univ, Natl Res Ctr Opt Instrumentat, Hangzhou 310027, Peoples R China
[2] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[4] Univ Huddersfield, Ctr Visual & Immers Comp, Huddersfield HD1 3DH, England
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Context modeling; Feature extraction; Computational modeling; Sun; Optical flow; Shape; Video semantic segmentation; global video context; compact video context; video context embedding; NETWORK;
D O I
10.1109/ACCESS.2024.3409150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Intuitively, global video context could benefit video semantic segmentation (VSS) if it is designed to simultaneously model global temporal and spatial dependencies for a holistic understanding of the semantic scenes in a video clip. However, we found that the existing VSS approaches focus only on modeling local video context. This paper attempts to bridge this gap by learning global video context for VSS. Apart from the global nature, the video context should also be compact when considering the large number of video feature tokens and the redundancy among nearby video frames. Then, we embed the learned global and compact video context into the features of the target video frame to improve the distinguishability. The proposed VSS method is dubbed Global and Compact Video Context Embedding (GCVCE). With the compact nature, the number of global context tokens is very limited so that GCVCE is flexible and efficient for VSS. Since it may be too challenging to directly abstract a large number of video feature tokens into a small number of global context tokens, we further design a Cascaded Convolutional Downsampling (CCD) module before GCVCE to help it work better. 1.6% improvement in mIoU on the popular VSPW dataset compared to previous state-of-the-art methods demonstrate the effectiveness and efficiency of GCVCE and CCD for VSS. Code and models will be made publicly available.
引用
收藏
页码:135589 / 135600
页数:12
相关论文
共 50 条
  • [1] Learning Local and Global Temporal Contexts for Video Semantic Segmentation
    Sun, Guolei
    Liu, Yun
    Ding, Henghui
    Wu, Min
    Van Gool, Luc
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (10) : 6919 - 6934
  • [2] Multi-Granularity Context Network for Efficient Video Semantic Segmentation
    Liang, Zhiyuan
    Dai, Xiangdong
    Wu, Yiqian
    Jin, Xiaogang
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3163 - 3175
  • [3] Deep Common Feature Mining for Efficient Video Semantic Segmentation
    Zheng, Yaoyan
    Yang, Hongyu
    Huang, Di
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12991 - 13003
  • [4] Dual Correlation Network for Efficient Video Semantic Segmentation
    An, Shumin
    Liao, Qingmin
    Lu, Zongqing
    Xue, Jing-Hao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1572 - 1585
  • [5] Video Semantic Segmentation leveraging Dense Optical Flow
    Lup, Vasile
    Nedevschi, Sergiu
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2020), 2020, : 369 - 376
  • [6] Exploring Global Diversity and Local Context for Video Summarization
    Pan, Yingchao
    Huang, Ouhan
    Ye, Qinghao
    Li, Zhongjin
    Wang, Wenjiang
    Li, Guodun
    Chen, Yuxing
    IEEE ACCESS, 2022, 10 : 43611 - 43622
  • [7] CSANet for Video Semantic Segmentation With Inter-Frame Mutual Learning
    Yuan, Yichen
    Wang, Lijun
    Wang, Yifan
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1675 - 1679
  • [8] Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation
    Wang, Bowen
    Li, Liangzhi
    Nakashima, Yuta
    Kawasaki, Ryo
    Nagahara, Hajime
    Yagi, Yasushi
    IEEE ACCESS, 2021, 9 : 46810 - 46820
  • [9] Attention-Guided Network for Semantic Video Segmentation
    Li, Jiangyun
    Zhao, Yikai
    Fu, Jun
    Wu, Jiajia
    Liu, Jing
    IEEE ACCESS, 2019, 7 : 140680 - 140689
  • [10] Efficient Video Semantic Segmentation with Labels Propagation and Refinement
    Paul, Matthieu
    Mayer, Christoph
    Van Gool, Luc
    Timofte, Radu
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2862 - 2871