Instance Sequence Queries for Video Instance Segmentation with Transformers

被引:2
作者
Xu, Zhujun [1 ]
Vivet, Damien [1 ]
机构
[1] Univ Toulouse, Inst Super Aeronaut & Espace ISAE SUPAERO, F-31400 Toulouse, France
关键词
video instance segmentation; transformer; query;
D O I
10.3390/s21134507
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. On TITAN Xp GPU, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset.
引用
收藏
页数:12
相关论文
共 28 条
[1]  
Athar Ali, 2020, Computer Vision - ECCV 2020 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12356), P158, DOI 10.1007/978-3-030-58621-8_10
[2]   Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation [J].
Bertasius, Gedas ;
Torresani, Lorenzo .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9736-9745
[3]   YOLACT Real-time Instance Segmentation [J].
Bolya, Daniel ;
Zhou, Chong ;
Xiao, Fanyi ;
Lee, Yong Jae .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165
[4]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[5]  
Carion N., 2020, ARXIV200512872
[6]   Hybrid Task Cascade for Instance Segmentation [J].
Chen, Kai ;
Pang, Jiangmiao ;
Wang, Jiaqi ;
Xiong, Yu ;
Li, Xiaoxiao ;
Sun, Shuyang ;
Feng, Wansen ;
Liu, Ziwei ;
Shi, Jianping ;
Ouyang, Wanli ;
Loy, Chen Change ;
Lin, Dahua .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4969-4978
[7]   TensorMask: A Foundation for Dense Object Segmentation [J].
Chen, Xinlei ;
Girshick, Ross ;
He, Kaiming ;
Dollar, Piotr .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2061-2069
[8]  
Dosovitskiy A, 2021, ICLR 2021 9 INT C LE
[9]   Dual Embedding Learning for Video Instance Segmentation [J].
Feng, Qianyu ;
Yang, Zongxin ;
Li, Peike ;
Wei, Yunchao ;
Yang, Yi .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :717-720
[10]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]