A W2VV++ Case Study with Automated and Interactive Text-to-Video Retrieval

被引:14
|
作者
Lokoc, Jakub [1 ]
Soucek, Tomas [1 ]
Vesely, Patrik [1 ]
Mejzlik, Frantisek [1 ]
Ji, Jiaqi [2 ]
Xu, Chaoxi [2 ]
Li, Xirong [2 ]
机构
[1] Charles Univ Prague, Fac Math & Phys, Dept Software Engn, SIRET Res Grp, Prague, Czech Republic
[2] Renmin Univ China, Sch Informat, Key Lab Data Engn & Knowledge Engn, AI & Media Comp Grp, Beijing, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
datasets; neural networks; ad-hoc search; known-item search; representation learning; IMAGE RETRIEVAL; SEARCH;
D O I
10.1145/3394171.3414002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As reported by respected evaluation campaigns focusing both on automated and interactive video search approaches, deep learning started to dominate the video retrieval area. However, the results are still not satisfactory for many types of search tasks focusing on high recall. To report on this challenging problem, we present two orthogonal task-based performance studies centered around the state-of-the-art W2VV++ query representation learning model for video retrieval. First, an ablation study is presented to investigate which components of the model are effective in two types of benchmark tasks focusing on high recall. Second, interactive search scenarios from the Video Browser Showdown are analyzed for two winning prototype systems implementing a selected variant of the model and providing additional querying and visualization components. The analysis of collected logs demonstrates that even with the state-of-the-art text search video retrieval model, it is still auspicious to integrate users into the search process for task types, where high recall is essential.
引用
收藏
页码:2553 / 2561
页数:9
相关论文
共 50 条
  • [1] An Empirical Study of Frame Selection for Text-to-Video Retrieval
    Wu, Mengxia
    Cao, Min
    Bai, Yang
    Zeng, Ziyin
    Chen, Chen
    Nie, Liqiang
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6821 - 6832
  • [2] W2VV++: Fully Deep Learning for Ad-hoc Video Search
    Li, Xirong
    Xu, Chaoxi
    Yang, Gang
    Chen, Zhineng
    Dong, Jianfeng
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1786 - 1794
  • [3] Learning Text-to-Video Retrieval from Image Captioning
    Ventura, Lucas
    Schmid, Cordelia
    Varol, Gul
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854
  • [4] Holistic Features are almost Sufficient for Text-to-Video Retrieval
    Tian, Kaibin
    Zhao, Ruixiang
    Xin, Zijie
    Lan, Bangxiang
    Li, Xirong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 17138 - 17147
  • [5] Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
    Hu, Fan
    Chen, Aozhu
    Wang, Ziyue
    Zhou, Fangming
    Dong, Jianfeng
    Li, Xirong
    COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 444 - 461
  • [6] Write What YouWant: Applying Text-to-Video Retrieval to Audiovisual Archives
    Yang, Yuchen
    ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2023, 16 (04):
  • [7] Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
    Rodriguez, Pedro
    Azab, Mahmoud
    Silvert, Becka
    Sanchez, Renato
    Labson, Linzy
    Shah, Hardik
    Moon, Seungwhan
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 47 - 68
  • [8] Relation Triplet Construction for Cross-modal Text-to-Video Retrieval
    Song, Xue
    Chen, Jingjing
    Jiang, Yu-Gang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4759 - 4767
  • [9] Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
    Ibrahimi, Sarah
    Sun, Xiaohang
    Wang, Pichao
    Garg, Amanmeet
    Sanan, Ashutosh
    Omar, Mohamed
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12020 - 12030
  • [10] Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval
    Yin, Sh ukang
    Zhao, Sirui
    Wang, Hao
    Xu, Tong
    Chen, Enhong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)