Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation

被引:1
|
作者
Fan, Jiaqing [1 ]
Su, Tiankang [2 ]
Zhang, Kaihua [3 ]
Liu, Bo [4 ]
Liu, Qingshan [5 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Automat, Nanjing, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Minist Educ, Engn Res Ctr Digital Forens, Nanjing, Peoples R China
[4] Walmart Global Tech, Sunnyvale, CA USA
[5] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
Unsupervised video object segmentation; Gabor filtering; Video Transformer; Spatio-temporal information selection;
D O I
10.1145/3581783.3612017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial-temporal structural details of targets in video (e.g. varying edges, textures over time) are essential to accurate Unsupervised Video Object Segmentation (UVOS). The vanilla multi-head self-attention in the Transformer-based UVOS methods usually concentrates on learning the general low-frequency information (e.g. illumination, color), while neglecting the high-frequency texture details, leading to unsatisfying segmentation results. To address this issue, this paper presents a Temporally efficient Gabor Transformer (TGFormer) for UVOS. The TGFormer jointly models the spatial dependencies and temporal coherence intra- and inter-frames, which can fully capture the rich structural details for accurate UVOS. Concretely, we first propose an effective learnable Gabor filtering Transformer to mine the structural texture details of the object for accurate UVOS. Then, to adaptively store the redundant neighboring historical information, we present an efficient dynamic neighboring frame selection module to automatically choose the useful temporal information, which simultaneously relieves the blurry frame and reduces the computation burden. Finally, we make the UVOS model be a fully Transformer architecture, meanwhile aggregating the information from space, Gabor and time domains, yielding a strong representation with rich structure details. Extensive experiments on five mainstream UVOS benchmarks (DAVIS2016, FBMS, DAVSOD, ViSal, and MCL) demonstrate the superiority of the presented solution to sate-of-the-art methods.
引用
收藏
页码:3394 / 3402
页数:9
相关论文
共 19 条
  • [1] Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation
    Li, Ping
    Zhang, Yu
    Yuan, Li
    Xiao, Huaxin
    Lin, Binbin
    Xu, Xianghua
    PATTERN RECOGNITION, 2024, 146
  • [2] Multi-Attention Network for Unsupervised Video Object Segmentation
    Zhang, Guifang
    Wong, Hon-Cheng
    Lo, Sio-Long
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 71 - 75
  • [3] Unsupervised Online Video Object Segmentation With Motion Property Understanding
    Zhuo, Tao
    Cheng, Zhiyong
    Zhang, Peng
    Wong, Yongkang
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 237 - 249
  • [4] Unsupervised video object segmentation with distractor-aware online adaptation
    Wang, Ye
    Choi, Jongmoo
    Chen, Yueru
    Li, Siyang
    Huang, Qin
    Zhang, Kaitai
    Lee, Ming-Sui
    Kuo, C-C Jay
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 74
  • [5] Unsupervised Video Object Segmentation via Parallel Multiple Direction Attention
    Fan J.-Q.
    Su T.-K.
    Zhang K.-H.
    Liu Q.-S.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (11): : 2337 - 2347
  • [6] Unsupervised Video Object Segmentation via Weak User Interaction and Temporal Modulation
    Fan Jiaqing
    Zhang Kaihua
    Zhao Yaqian
    Liu Qingshan
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 507 - 518
  • [7] SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation
    Hong, Lingyi
    Zhang, Wei
    Gao, Shuyong
    Lu, Hong
    Zhang, WenQiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7481 - 7490
  • [8] Saliency-based dual-attention network for unsupervised video object segmentation
    Zhang, Guifang
    Wong, Hon-Cheng
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (04) : 4996 - 5010
  • [9] Dual-stream Co-enhanced Network for Unsupervised Video Object Segmentation
    Zhu, Hongliang
    Yin, Hui
    Liu, Yanting
    Chen, Ning
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (04): : 938 - 958
  • [10] Saliency-based dual-attention network for unsupervised video object segmentation
    Guifang Zhang
    Hon-Cheng Wong
    The Journal of Supercomputing, 2024, 80 (4) : 4996 - 5010