A W2VV++ Case Study with Automated and Interactive Text-to-Video Retrieval

被引：14

作者：

Lokoc, Jakub ^{[1
]}

Soucek, Tomas ^{[1
]}

Vesely, Patrik ^{[1
]}

Mejzlik, Frantisek ^{[1
]}

Ji, Jiaqi ^{[2
]}

Xu, Chaoxi ^{[2
]}

Li, Xirong ^{[2
]}

机构：

[1] Charles Univ Prague, Fac Math & Phys, Dept Software Engn, SIRET Res Grp, Prague, Czech Republic

[2] Renmin Univ China, Sch Informat, Key Lab Data Engn & Knowledge Engn, AI & Media Comp Grp, Beijing, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

datasets; neural networks; ad-hoc search; known-item search; representation learning; IMAGE RETRIEVAL; SEARCH;

D O I：

10.1145/3394171.3414002

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As reported by respected evaluation campaigns focusing both on automated and interactive video search approaches, deep learning started to dominate the video retrieval area. However, the results are still not satisfactory for many types of search tasks focusing on high recall. To report on this challenging problem, we present two orthogonal task-based performance studies centered around the state-of-the-art W2VV++ query representation learning model for video retrieval. First, an ablation study is presented to investigate which components of the model are effective in two types of benchmark tasks focusing on high recall. Second, interactive search scenarios from the Video Browser Showdown are analyzed for two winning prototype systems implementing a selected variant of the model and providing additional querying and visualization components. The analysis of collected logs demonstrates that even with the state-of-the-art text search video retrieval model, it is still auspicious to integrate users into the search process for task types, where high recall is essential.

引用

页码：2553 / 2561

页数：9

共 50 条

[21] Text readability within video retrieval applications: A study on CCTV analysis
Newbold N.
Gillam L.
Journal of Multimedia, 2010, 5 (02): : 123 - 141
[22] Interactive Search vs. Automatic Search: An Extensive Study on Video Retrieval
Phuong-Anh Nguyen
Chong-Wah Ngo
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (02)
[23] CASE-STUDY - INTERACTIVE VIDEO AND LANGUAGE-LEARNING
BANGS, P
EDUCATIONAL & TRAINING TECHNOLOGY INTERNATIONAL, 1990, 27 (02): : 146 - 154
[24] CLIP2TF:Multimodal video-text retrieval for adolescent education
Sun, Xiaoning
Fan, Tao
Li, Hongxu
Wang, Guozhong
Ge, Peien
Shang, Xiwu
DISPLAYS, 2024, 84
[25] Automated Depth Video Monitoring For Fall Reduction : A Case Study
Kramer, Josh Brown
Sabalka, Lucas
Rush, Ben
Jones, Katherine
Nolte, Tegan
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 1188 - 1196
[26] Interactive Web Documentaries: A Case Study of Video Viewing Behaviour on iOtok
Ducasse, Julie
Kljun, Matjaz
Attygalle, Nuwan T.
Pucihar, Klen Copic
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2022, 38 (10) : 949 - 972
[27] TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
Liu, Yuqi
Xiong, Pengfei
Xu, Luhui
Cao, Shengming
Jin, Qin
COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 319 - 335
[28] Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval
Song, Xue
Chen, Jingjing
Wu, Zuxuan
Jiang, Yu-Gang
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2914 - 2923
[29] Textbooks for the YouTube generation? A case study on the shift from text to video
Granitz, Neil
Kohli, Chiranjeev
Lancellotti, Matthew P.
JOURNAL OF EDUCATION FOR BUSINESS, 2021, 96 (05) : 299 - 307
[30] ADAPTATION OF EDUCATIONAL TEXT TO AN OPEN INTERACTIVE LEARNING SYSTEM: A CASE STUDY FOR RETUDIS
Samarakou, M.
Fylladitakis, E. D.
Tsaganou, G.
Gelegenis, J.
Karolidis, D.
Prentakis, P.
Papadakis, A.
PROCEEDINGS OF THE IADIS INTERNATIONAL CONFERENCE E-LEARNING 2013, 2013, : 296 - 302

← 1 2 3 4 5 →