An Empirical Study of Frame Selection for Text-to-Video Retrieval

被引:0
|
作者
Wu, Mengxia [1 ]
Cao, Min [1 ]
Bai, Yang [1 ]
Zeng, Ziyin [1 ]
Chen, Chen [2 ]
Nie, Liqiang [3 ]
Zhang, Min [1 ]
机构
[1] Soochow Univ, Suzhou, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Harbin Inst Technol, Shenzhen, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-video retrieval (TVR) aims to find the most relevant video in a large video gallery given a query text. The intricate and abundant context of the video challenges the performance and efficiency of TVR. To handle the serialized video contexts, existing methods typically select a subset of frames within a video to represent the video content for TVR. How to select the most representative frames is a crucial issue, whereby the selected frames are required to not only retain the semantic information of the video but also promote retrieval efficiency by excluding temporally redundant frames. In this paper, we make the first empirical study of frame selection for TVR. We systemically classify existing frame selection methods into text-free and text-guided ones, under which we detailedly analyze six different frame selections in terms of effectiveness and efficiency. Among them, two frame selections are first developed in this paper. According to the comprehensive analysis on multiple TVR benchmarks, we empirically conclude that the TVR with proper frame selections can significantly improve the retrieval efficiency without sacrificing the retrieval performance.
引用
收藏
页码:6821 / 6832
页数:12
相关论文
共 50 条
  • [1] Learning Text-to-Video Retrieval from Image Captioning
    Ventura, Lucas
    Schmid, Cordelia
    Varol, Gul
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 1834 - 1854
  • [2] Holistic Features are almost Sufficient for Text-to-Video Retrieval
    Tian, Kaibin
    Zhao, Ruixiang
    Xin, Zijie
    Lan, Bangxiang
    Li, Xirong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 17138 - 17147
  • [3] A W2VV++ Case Study with Automated and Interactive Text-to-Video Retrieval
    Lokoc, Jakub
    Soucek, Tomas
    Vesely, Patrik
    Mejzlik, Frantisek
    Ji, Jiaqi
    Xu, Chaoxi
    Li, Xirong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2553 - 2561
  • [4] Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
    Hu, Fan
    Chen, Aozhu
    Wang, Ziyue
    Zhou, Fangming
    Dong, Jianfeng
    Li, Xirong
    COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 444 - 461
  • [5] Write What YouWant: Applying Text-to-Video Retrieval to Audiovisual Archives
    Yang, Yuchen
    ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2023, 16 (04):
  • [6] Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks
    Rodriguez, Pedro
    Azab, Mahmoud
    Silvert, Becka
    Sanchez, Renato
    Labson, Linzy
    Shah, Hardik
    Moon, Seungwhan
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 47 - 68
  • [7] Relation Triplet Construction for Cross-modal Text-to-Video Retrieval
    Song, Xue
    Chen, Jingjing
    Jiang, Yu-Gang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4759 - 4767
  • [8] Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
    Ibrahimi, Sarah
    Sun, Xiaohang
    Wang, Pichao
    Garg, Amanmeet
    Sanan, Ashutosh
    Omar, Mohamed
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12020 - 12030
  • [9] Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video Retrieval
    Yin, Sh ukang
    Zhao, Sirui
    Wang, Hao
    Xu, Tong
    Chen, Enhong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (10)
  • [10] Reading-Strategy Inspired Visual Representation Learning for Text-to-Video Retrieval
    Dong, Jianfeng
    Wang, Yabing
    Chen, Xianke
    Qu, Xiaoye
    Li, Xirong
    He, Yuan
    Wang, Xun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5680 - 5694