Unsupervised Video Object Segmentation via Weak User Interaction and Temporal Modulation

被引：1

作者：

Fan Jiaqing ^{[1
]}

Zhang Kaihua ^{[2
,3
]}

Zhao Yaqian ^{[4
]}

Liu Qingshan ^{[2
,3
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China

[2] Nanjing Univ Informat Sci & Technol, Coll Comp & Software, Nanjing 210044, Peoples R China

[3] Minist Educ, Engn Res Ctr Digital Forens, Nanjing 210044, Peoples R China

[4] Inspur Suzhou Intelligent Technol Corp, Suzhou 215000, Peoples R China

来源：

CHINESE JOURNAL OF ELECTRONICS | 2023年 / 32卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Unsupervised video object segmentation; The earth mover's distance (EMD)-based modulation; Cross-squeeze modulation; Weak interaction; Region-based convolutional neural networks (RCNN); EARTH-MOVERS-DISTANCE;

D O I：

10.23919/cje.2022.00.139

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In unsupervised video object segmentation (UVOS), the whole video might segment the wrong target due to the lack of initial prior information. Also, in semi-supervised video object segmentation (SVOS), the initial video frame with a fine-grained pixel-level mask is essential to good segmentation accuracy. It is expensive and laborious to provide the accurate pixel-level masks for each training sequence. To address this issue, We present a weak user interactive UVOS approach guided by a simple human-made rectangle annotation in the initial frame. We first interactively draw the region of interest by a rectangle, and then we leverage the mask RCNN (region-based convolutional neural networks) method to generate a set of coarse reference labels for subsequent mask propagations. To establish the temporal correspondence between the coherent frames, we further design two novel temporal modulation modules to enhance the target representations. We compute the earth mover's distance (EMD)-based similarity between coherent frames to mine the co-occurrent objects in the two images, which is used to modulate the target representation to highlight the foreground target. We design a cross-squeeze temporal modulation module to emphasize the co-occurrent features across frames, which further helps to enhance the foreground target representation. We augment the temporally modulated representations with the original representation and obtain the compositive spatio-temporal information, producing a more accurate video object segmentation (VOS) model. The experimental results on both UVOS and SVOS datasets including Davis2016, FBMS, Youtube-VOS, and Davis2017, show that our method yields favorable accuracy and complexity. The related code is available.

引用

页码：507 / 518

页数：12

共 12 条

[1] Unsupervised Video Object Segmentation via Parallel Multiple Direction Attention
Fan J.-Q.
Su T.-K.
Zhang K.-H.
Liu Q.-S.
Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (11): : 2337 - 2347
[2] Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation
Li, Ping
Zhang, Yu
Yuan, Li
Xiao, Huaxin
Lin, Binbin
Xu, Xianghua
PATTERN RECOGNITION, 2024, 146
[3] Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation
Fan, Jiaqing
Su, Tiankang
Zhang, Kaihua
Liu, Qingshan
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3646 - 3655
[4] Multi-Attention Network for Unsupervised Video Object Segmentation
Zhang, Guifang
Wong, Hon-Cheng
Lo, Sio-Long
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 71 - 75
[5] Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation
Fan, Jiaqing
Su, Tiankang
Zhang, Kaihua
Liu, Bo
Liu, Qingshan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3394 - 3402
[6] Unsupervised Online Video Object Segmentation With Motion Property Understanding
Zhuo, Tao
Cheng, Zhiyong
Zhang, Peng
Wong, Yongkang
Kankanhalli, Mohan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 237 - 249
[7] Unsupervised video object segmentation with distractor-aware online adaptation
Wang, Ye
Choi, Jongmoo
Chen, Yueru
Li, Siyang
Huang, Qin
Zhang, Kaitai
Lee, Ming-Sui
Kuo, C-C Jay
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 74
[8] SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation
Hong, Lingyi
Zhang, Wei
Gao, Shuyong
Lu, Hong
Zhang, WenQiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7481 - 7490
[9] Saliency-based dual-attention network for unsupervised video object segmentation
Zhang, Guifang
Wong, Hon-Cheng
JOURNAL OF SUPERCOMPUTING, 2024, 80 (04) : 4996 - 5010
[10] Dual-stream Co-enhanced Network for Unsupervised Video Object Segmentation
Zhu, Hongliang
Yin, Hui
Liu, Yanting
Chen, Ning
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (04): : 938 - 958

← 1 2 →