Global and Compact Video Context Embedding for Video Semantic Segmentation

被引:0
|
作者
Sun, Lei [1 ,2 ]
Liu, Yun [3 ]
Sun, Guolei [2 ]
Wu, Min [3 ]
Xu, Zhijie [4 ]
Wang, Kaiwei [1 ]
Van Gool, Luc [2 ]
机构
[1] Zhejiang Univ, Natl Res Ctr Opt Instrumentat, Hangzhou 310027, Peoples R China
[2] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[4] Univ Huddersfield, Ctr Visual & Immers Comp, Huddersfield HD1 3DH, England
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Context modeling; Feature extraction; Computational modeling; Sun; Optical flow; Shape; Video semantic segmentation; global video context; compact video context; video context embedding; NETWORK;
D O I
10.1109/ACCESS.2024.3409150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Intuitively, global video context could benefit video semantic segmentation (VSS) if it is designed to simultaneously model global temporal and spatial dependencies for a holistic understanding of the semantic scenes in a video clip. However, we found that the existing VSS approaches focus only on modeling local video context. This paper attempts to bridge this gap by learning global video context for VSS. Apart from the global nature, the video context should also be compact when considering the large number of video feature tokens and the redundancy among nearby video frames. Then, we embed the learned global and compact video context into the features of the target video frame to improve the distinguishability. The proposed VSS method is dubbed Global and Compact Video Context Embedding (GCVCE). With the compact nature, the number of global context tokens is very limited so that GCVCE is flexible and efficient for VSS. Since it may be too challenging to directly abstract a large number of video feature tokens into a small number of global context tokens, we further design a Cascaded Convolutional Downsampling (CCD) module before GCVCE to help it work better. 1.6% improvement in mIoU on the popular VSPW dataset compared to previous state-of-the-art methods demonstrate the effectiveness and efficiency of GCVCE and CCD for VSS. Code and models will be made publicly available.
引用
收藏
页码:135589 / 135600
页数:12
相关论文
共 50 条
  • [21] Capturing the spatio-temporal continuity for video semantic segmentation
    Chen, Xin
    Wu, Aming
    Han, Yahong
    IET IMAGE PROCESSING, 2019, 13 (14) : 2813 - 2820
  • [22] Weakly Supervised Semantic Segmentation Learning on UAV Video Sequences
    Blaga, Bianca-Cerasela-Zelia
    Nedevschi, Sergiu
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 731 - 735
  • [23] HECR-Net: Height-Embedding Context Reassembly Network for Semantic Segmentation in Aerial Images
    Liu, Wenjie
    Zhang, Wenkai
    Sun, Xian
    Guo, Zhi
    Fu, Kun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 (14) : 9117 - 9131
  • [24] Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline
    Ji, Wei
    Li, Jingjing
    Bian, Cheng
    Zhou, Zongwei
    Zhao, Jiaying
    Yuille, Alan
    Cheng, Li
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1094 - 1104
  • [25] Context propagation embedding network for weakly supervised semantic segmentation
    Yajun Xu
    Zhendong Mao
    Zhineng Chen
    Xin Wen
    Yangyang Li
    Multimedia Tools and Applications, 2020, 79 : 33925 - 33942
  • [26] FASSVid: Fast and Accurate Semantic Segmentation for Video Sequences
    Portillo-Portillo, Jose
    Sanchez-Perez, Gabriel
    Toscano-Medina, Linda K.
    Hernandez-Suarez, Aldo
    Olivares-Mercado, Jesus
    Perez-Meana, Hector
    Velarde-Alvarado, Pablo
    Sandoval Orozco, Ana Lucila
    Garcia Villalba, Luis Javier
    ENTROPY, 2022, 24 (07)
  • [27] Context propagation embedding network for weakly supervised semantic segmentation
    Xu, Yajun
    Mao, Zhendong
    Chen, Zhineng
    Wen, Xin
    Li, Yangyang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33925 - 33942
  • [28] Efficient pyramid context encoding and feature embedding for semantic segmentation
    Liu, Mengyu
    Yin, Hujun
    IMAGE AND VISION COMPUTING, 2021, 111
  • [29] Selfie Segmentation in Video Using N-Frames Ensemble
    Kim, Yong-Woon
    Byun, Yung-Cheol
    Krishna, Addapalli V. N.
    Krishnan, Balachandran
    IEEE ACCESS, 2021, 9 : 163348 - 163362
  • [30] A Real-Time Road Scene Semantic Segmentation Model Based on Spatial Context Learning
    Xiao, Xiaomei
    Tang, Jialiang
    Lu, Xiaoyan
    Feng, Zhengyong
    Li, Yi
    IEEE ACCESS, 2024, 12 : 178495 - 178506