Global and Compact Video Context Embedding for Video Semantic Segmentation

被引:0
|
作者
Sun, Lei [1 ,2 ]
Liu, Yun [3 ]
Sun, Guolei [2 ]
Wu, Min [3 ]
Xu, Zhijie [4 ]
Wang, Kaiwei [1 ]
Van Gool, Luc [2 ]
机构
[1] Zhejiang Univ, Natl Res Ctr Opt Instrumentat, Hangzhou 310027, Peoples R China
[2] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[4] Univ Huddersfield, Ctr Visual & Immers Comp, Huddersfield HD1 3DH, England
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Context modeling; Feature extraction; Computational modeling; Sun; Optical flow; Shape; Video semantic segmentation; global video context; compact video context; video context embedding; NETWORK;
D O I
10.1109/ACCESS.2024.3409150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Intuitively, global video context could benefit video semantic segmentation (VSS) if it is designed to simultaneously model global temporal and spatial dependencies for a holistic understanding of the semantic scenes in a video clip. However, we found that the existing VSS approaches focus only on modeling local video context. This paper attempts to bridge this gap by learning global video context for VSS. Apart from the global nature, the video context should also be compact when considering the large number of video feature tokens and the redundancy among nearby video frames. Then, we embed the learned global and compact video context into the features of the target video frame to improve the distinguishability. The proposed VSS method is dubbed Global and Compact Video Context Embedding (GCVCE). With the compact nature, the number of global context tokens is very limited so that GCVCE is flexible and efficient for VSS. Since it may be too challenging to directly abstract a large number of video feature tokens into a small number of global context tokens, we further design a Cascaded Convolutional Downsampling (CCD) module before GCVCE to help it work better. 1.6% improvement in mIoU on the popular VSPW dataset compared to previous state-of-the-art methods demonstrate the effectiveness and efficiency of GCVCE and CCD for VSS. Code and models will be made publicly available.
引用
收藏
页码:135589 / 135600
页数:12
相关论文
共 50 条
  • [31] SSFNET-VOS: Semantic segmentation and fusion network for video object segmentation
    Sharma, Vipal Kumar
    Mir, Roohie Naaz
    PATTERN RECOGNITION LETTERS, 2020, 140 : 49 - 58
  • [32] Video semantic segmentation via feature propagation with holistic attention
    Wu, Junrong
    Wen, Zongzheng
    Zhao, Sanyuan
    Huang, Kele
    PATTERN RECOGNITION, 2020, 104
  • [33] Image semantic segmentation and stitching method of traffic monitoring video
    Liu S.
    Wu P.
    Zhao Z.
    Li C.
    Wu, Pengda (wupd@casm.ac.cn), 1600, SinoMaps Press (49): : 522 - 532
  • [34] A survey on deep learning techniques for image and video semantic segmentation
    Garcia-Garcia, Alberto
    Orts-Escolano, Sergio
    Oprea, Sergiu
    Villena-Martinez, Victor
    Martinez-Gonzalez, Pablo
    Garcia-Rodriguez, Jose
    APPLIED SOFT COMPUTING, 2018, 70 : 41 - 65
  • [35] TDSNet: A temporal difference based network for video semantic segmentation
    Yuan, Haochen
    Peng, Junjie
    Cai, Zesu
    INFORMATION SCIENCES, 2025, 686
  • [36] A video coverless information hiding algorithm based on semantic segmentation
    Nan Pan
    Jiaohua Qin
    Yun Tan
    Xuyu Xiang
    Guimin Hou
    EURASIP Journal on Image and Video Processing, 2020
  • [37] 3D video semantic segmentation for wildfire smoke
    Zhu, Guodong
    Chen, Zhenxue
    Liu, Chengyun
    Rong, Xuewen
    He, Weikai
    MACHINE VISION AND APPLICATIONS, 2020, 31 (06)
  • [38] ConvLSTM-based Neural Network for Video Semantic Segmentation
    Zhou, Lan
    Yuan, Hui
    Ge, Chuan
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [39] A Distributed Scheme for Accelerating Semantic Video Segmentation on An Embedded Cluster
    Yang, Hsuan-Kung
    Fu, Tsu-Jui
    Chiang, Po-Han
    Ho, Kuan-Wei
    Lee, Chun-Yi
    2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019), 2019, : 73 - 81
  • [40] PIXEL-WISE FAILURE PREDICTION FOR SEMANTIC VIDEO SEGMENTATION
    Kuhn, Christopher B.
    Hofbauer, Markus
    Xu, Ziqin
    Petrovic, Goran
    Steinbach, Eckehard
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 614 - 618