A W2VV++ Case Study with Automated and Interactive Text-to-Video Retrieval

被引：15

作者：

Lokoc, Jakub ^{[1
]}

Soucek, Tomas ^{[1
]}

Vesely, Patrik ^{[1
]}

Mejzlik, Frantisek ^{[1
]}

Ji, Jiaqi ^{[2
]}

Xu, Chaoxi ^{[2
]}

Li, Xirong ^{[2
]}

机构：

[1] Charles Univ Prague, Fac Math & Phys, Dept Software Engn, SIRET Res Grp, Prague, Czech Republic

[2] Renmin Univ China, Sch Informat, Key Lab Data Engn & Knowledge Engn, AI & Media Comp Grp, Beijing, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

datasets; neural networks; ad-hoc search; known-item search; representation learning; IMAGE RETRIEVAL; SEARCH;

D O I：

10.1145/3394171.3414002

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As reported by respected evaluation campaigns focusing both on automated and interactive video search approaches, deep learning started to dominate the video retrieval area. However, the results are still not satisfactory for many types of search tasks focusing on high recall. To report on this challenging problem, we present two orthogonal task-based performance studies centered around the state-of-the-art W2VV++ query representation learning model for video retrieval. First, an ablation study is presented to investigate which components of the model are effective in two types of benchmark tasks focusing on high recall. Second, interactive search scenarios from the Video Browser Showdown are analyzed for two winning prototype systems implementing a selected variant of the model and providing additional querying and visualization components. The analysis of collected logs demonstrates that even with the state-of-the-art text search video retrieval model, it is still auspicious to integrate users into the search process for task types, where high recall is essential.

引用

页码：2553 / 2561

页数：9

共 39 条

[1] VERGE in VBS 2020 [J].

Andreadis, Stelios ;

Moumtzidou, Anastasia ;

Apostolidis, Konstantinos ;

Gkountakos, Konstantinos ;

Galanopoulos, Damianos ;

Michail, Emmanouil ;

Gialampoukidis, Ilias ;

Vrochidis, Stefanos ;

Mezaris, Vasileios ;

Kompatsiaris, Ioannis .

MULTIMEDIA MODELING (MMM 2020), PT II, 2020, 11962 :778-783

[2]

[Anonymous], 2017, TREC Video Retrieval Evaluation

[3]

[Anonymous], 2016, ICMR

[4]

[Anonymous], 2018, P TRECVID 2018

[5]

Awad G., 2016, P TRECVID

[6]

Awad G., 2019, P TRECVID 2019

[7]

Barthel Kai Uwe, 2015, MultiMedia Modeling. 21st International Conference, MMM 2015. Proceedings: LNCS 8936, P287, DOI 10.1007/978-3-319-14442-9_30

[8]

Cho K., 2014, CORR, P103, DOI 10.3115/v1/W14-4012

[9] The Bayesian image retrieval system, PicHunter:: Theory, implementation, and psychophysical experiments [J].

Cox, IJ ;

Miller, ML ;

Minka, TP ;

Papathomas, TV ;

Yianilos, PN .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, 9 (01) :20-37

[10] Dual Encoding for Zero-Example Video Retrieval [J].

Dong, Jianfeng ;

Li, Xirong ;

Xu, Chaoxi ;

Ji, Shouling ;

He, Yuan ;

Yang, Gang ;

Wang, Xun .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9338-9347

← 1 2 3 4 →