Video Re-localization

被引：30

作者：

Feng, Yang ^{[1
,2
]}

Ma, Lin ^{[1
]}

Liu, Wei ^{[1
]}

Zhang, Tong ^{[1
]}

Luo, Jiebo ^{[2
]}

机构：

[1] Tencent AI Lab, Shenzhen, Peoples R China

[2] Univ Rochester, Rochester, NY 14627 USA

来源：

COMPUTER VISION - ECCV 2018, PT XIV | 2018年 / 11218卷

基金：

美国国家科学基金会;

关键词：

Video re-localization; Cross gating; Bilinear matching;

D O I：

10.1007/978-3-030-01264-9_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many methods have been developed to help people find the video content they want efficiently. However, there are still some unsolved problems in this area. For example, given a query video and a reference video, how to accurately localize a segment in the reference video such that the segment semantically corresponds to the query video? We define a distinctively new task, namely video re-localization, to address this need. Video re-localization is an important enabling technology with many applications, such as fast seeking in videos, video copy detection, as well as video surveillance. Meanwhile, it is also a challenging research task because the visual appearance of a semantic concept in videos can have large variations. The first hurdle to clear for the video re-localization task is the lack of existing datasets. It is labor expensive to collect pairs of videos with semantic coherence or correspondence, and label the corresponding segments. We first exploit and reorganize the videos in ActivityNet to form a new dataset for video re-localization research, which consists of about 10,000 videos of diverse visual appearances associated with the localized boundary information. Subsequently, we propose an innovative cross gated bilinear matching model such that every time-step in the reference video is matched against the attentively weighted query video. Consequently, the prediction of the starting and ending time is formulated as a classification problem based on the matching results. Extensive experimental results show that the proposed method outperforms the baseline methods. Our code is available at: https://github.com/fengyang0317/video reloc.

引用

页码：55 / 70

页数：16

共 30 条

[1]

[Anonymous], 2017, CVPR

[2]

[Anonymous], 2016, ECCV

[3]

[Anonymous], 2017, CVPR

[4]

[Anonymous], 2017, ICCV

[5]

[Anonymous], 1997, Neural Computation

[6]

[Anonymous], 2011, ASS COMPUTATIONAL LI

[7]

[Anonymous], 2012, LNCS, DOI DOI 10.1007/978-3-642-35749-7_17

[8]

[Anonymous], 2015, THUMOS challenge: Action recognition with a large number of classes

[9]

[Anonymous], 2017, CVPR

[10]

[Anonymous], 2016, Machine comprehension using match-lstm and answer pointer

← 1 2 3 →