MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

被引:33
作者
Ding, Henghui [1 ]
Liu, Chang [2 ]
He, Shuting [2 ]
Jiang, Xudong [2 ]
Loy, Chen Change [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
[2] Nanyang Technol Univ, Sch EEE, Singapore, Singapore
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00254
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper strives for motion expressions guided video segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. Existing referring video object datasets typically focus on salient objects and use language expressions that contain excessive static attributes that could potentially enable the target object to be identified in a single frame. These datasets downplay the importance of motion in video content for language-guided video object segmentation. To investigate the feasibility of using motion expressions to ground and segment objects in videos, we propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments. We benchmarked 5 existing referring video object segmentation (RVOS) methods and conducted a comprehensive comparison on the MeViS dataset. The results show that current RVOS methods cannot effectively address motion expression- guided video segmentation. We further analyze the challenges and propose a baseline approach for the proposed MeViS dataset. The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms that leverage motion expressions as a primary cue for object segmentation in complex video scenes. The proposed MeViS dataset has been released at https://henghuiding.github.io/MeViS.
引用
收藏
页码:2694 / 2703
页数:10
相关论文
共 63 条
[31]  
Liu Si, 2021, IEEE T PATTERN ANAL
[32]   An Integrated View of Information Feedback, Game Quality, and Autonomous Motivation for Evaluating Game-Based Learning Effectiveness [J].
Liu, Yi Chun ;
Wang, Wei-Tsong ;
Lee, Tzu-Lien .
JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 2021, 59 (01) :3-40
[33]  
Liu Yinhan., 2020, CoRR abs/1907.11692
[34]  
Liu Ze, 2021, P IEEE INT C COMP VI
[35]   Large Scale Machine Learning for Response Prediction [J].
Long, Bo .
2nd Workshop on Parallel Programming for Analytics Applications (PPAA 2015), 2015, :2-2
[36]  
Loshchilov I., 2019, INT C LEARNING REPRE, DOI DOI 10.48550/ARXIV.1711.05101
[37]   Generation and Comprehension of Unambiguous Object Descriptions [J].
Mao, Junhua ;
Huang, Jonathan ;
Toshev, Alexander ;
Camburu, Oana ;
Yuille, Alan ;
Murphy, Kevin .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :11-20
[38]  
Margffoy-Tuay Edgar, 2018, P EUR C COMP VIS PAT
[39]  
McIntosh Bruce, 2020, P IEEE C COMP VIS PA
[40]  
Ning Ke, 2020, IJCAI