Instance Motion Tendency Learning for Video Panoptic Segmentation

被引：3

作者：

Wang, Le ^{[1
,2
]}

Liu, Hongzhen ^{[1
]}

Zhou, Sanping ^{[1
]}

Tang, Wei ^{[3
]}

Hua, Gang ^{[4
]}

机构：

[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China

[2] Shunan Acad Artificial Intelligence, Ningbo 315000, Zhejiang, Peoples R China

[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA

[4] Wormpex AI Res, Bellevue, WA 98004 USA

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

中国博士后科学基金;

关键词：

Image segmentation; Motion segmentation; Task analysis; Tracking; Optical flow; Transformers; Target tracking; Video panoptic segmentation; motion tendency; boundary refinement; deep neural network; TRACKING;

D O I：

10.1109/TIP.2022.3226414

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video panoptic segmentation is an important but challenging task in computer vision. It not only performs panoptic segmentation of each frame, but also associates the same instance across adjacent frames. Due to the lack of temporal coherence modeling, most existing approaches often generate identity switches during instance association, and they cannot handle ambiguous segmentation boundaries caused by motion blur. To address these difficult issues, we introduce a simple yet effective Instance Motion Tendency Network (IMTNet) for video panoptic segmentation. It learns a global motion tendency map for instance association, and a hierarchical classifier for motion boundary refinement. Specifically, a Global Motion Tendency Module (GMTM) is designed to learn robust motion features from optical flows, which can directly associate each instance in the previous frame to the corresponding instance in the current frame. In addition, we propose a Motion Boundary Refinement Module (MBRM) to learn a hierarchical classifier to handle the boundary pixels of moving targets, which can effectively revise the inaccurate segmentation predictions. Experimental results on both Cityscapes and Cityscapes-VPS datasets show that our IMTNet outperforms most state-of-the-art approaches.

引用

页码：764 / 778

页数：15

共 75 条

[1]

[Anonymous], 2015, P IEEE C COMP VIS PA

[2] Robust Online Multiobject Tracking With Data Association and Track Management [J].

Bae, Seung-Hwan ;

Yoon, Kuk-Jin .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (07) :2820-2833

[3]

Bishop G., 2001, An Introduction to the Kalman Filter, V8, P41

[4]

Cai J, 2022, P IEEECVF C COMPUTER, P8090

[5]

Chen K, 2019, Arxiv, DOI arXiv:1906.07155

[6] Boundary IoU: Improving Object-Centric Image Segmentation Evaluation [J].

Cheng, Bowen ;

Girshick, Ross ;

Dollar, Piotr ;

Berg, Alexander C. ;

Kirillov, Alexander .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15329-15337

[7]

Cheng BW, 2020, PROC CVPR IEEE, P12472, DOI 10.1109/CVPR42600.2020.01249

[8] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[9] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[10]

de Geus D, 2019, Arxiv, DOI arXiv:1809.02110

← 1 2 3 4 5 6 7 8 →