Mask generation dynamically regulates weakly supervised video instance segmentation

被引:0
作者
He Z. [1 ]
Xu L. [1 ]
Zhang Y. [1 ]
Huang Y. [1 ]
机构
[1] Faculty of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming
来源
Guangxue Jingmi Gongcheng/Optics and Precision Engineering | 2023年 / 31卷 / 19期
关键词
binary color similarity; dynamic regulation; intelligent machine; multi-level feature fusion; weakly supervised video instance segmentation;
D O I
10.37188/OPE.20233119.2884
中图分类号
学科分类号
摘要
The training data of fully supervised video instance segmentation networks are highly dependent on accurate mask annotations under high labor and time costs, owing to which intelligent machines are unable to quickly adapt to new scenes. Therefore, a mask generation, dynamically regulated weakly supervised video instance segmentation (WSVIS) network was proposed. First, to overcome the loss of instance activation features caused by the sudden dimension drop of the initial mask prediction layer channel, a multi-level feature fusion module was used to predict the initial instance features through a step-by-step feature reuse strategy and to generate the initial mask by fusing the relative position information. Second, a dynamic regulation mechanism was introduced to establish mask feature dependencies in the channel and spatial dimensions to strengthen the dynamic interaction between the initial predicted mask and instance-aware information. Finally, the network replaces fine mask labeling with the binary color similarity of images, and the bounding box consistency loss and supervised video instance segmentation mask were replaced with bounding box labeling only. Experimental results reveal that on the BoxSet and YT-VIS datasets, the WSVIS network achieves similar segmentation accuracy and segmentation effect as the fully supervised network and can satisfy real-time reasoning, providing theoretical support and an algorithmic basis for intelligent machines to quickly adapt to new scenes to realize real-time environmental perception and understanding. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2884 / 2897
页数:13
相关论文
共 29 条
[1]  
YANG L J, FAN Y C, XU N., Video instance segmentation[C], 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5187-5196, (2019)
[2]  
MAO L, REN F ZH, YANG D W, Et al., INFNet: Deep instance feature chain learning network for panoptic segmentation[J], Opt. Precision Eng, 28, 12, pp. 2665-2673, (2020)
[3]  
LIANG X Y, LIN X K, QUAN J CH, Et al., Research on the progress of image instance segmentation based on deep learning[J], Acta Electronica Sinica, 48, 12, pp. 2476-2486, (2020)
[4]  
CAO T Y, CAI H Y, FANG D M, Et al., Robot vision system for keyframe global map establishment and robot localization based on graphic content matching [J], Opt. Precision Eng, 25, 8, pp. 2221-2232, (2017)
[5]  
QIAN K, SONG A G., An improved bionic cognitive neural network for robot[J], Acta Electronica Sinica, 43, 6, pp. 1084-1089, (2015)
[6]  
WU X R, XUE Q W., 3D vehicle detection for unmanned driving systerm based on lidar[J], Opt. Precision Eng, 30, 4, pp. 489-497, (2022)
[7]  
QIN F W, SHEN X Y, PENG Y, Et al., A real-time semantic segmentation approach for autonomous driving scenes[J], Journal of Computer-Aided Design & Computer Graphics, 33, 7, pp. 1026-1037, (2021)
[8]  
LI SH H, DENG ZH H, FENG X X, Et al., Variational Bayesian Inference? Based airborne radar target tracking algorithm in strong clutter[J], Acta Electronica Sinica, 50, 5, pp. 1089-1097, (2022)
[9]  
WANG SH L, BI D P, RUAN H L, Et al., Cognitive radar maneuvering target tracking algorithm based on information entropy criterion[J], Acta Electronica Sinica, 47, 6, pp. 1277-1284, (2019)
[10]  
KHOREVA A, BENENSON R, HOSANG J, Et al., Simple does it: weakly supervised instance and semantic segmentation[C], 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1665-1674