Joint embeddings with multimodal cues for video-text retrieval

被引:0
|
作者
Niluthpol C. Mithun
Juncheng Li
Florian Metze
Amit K. Roy-Chowdhury
机构
[1] University of California,
[2] Carnegie Mellon University,undefined
来源
International Journal of Multimedia Information Retrieval | 2019年 / 8卷
关键词
Video-text retrieval; Joint embedding; Multimodal cues;
D O I
暂无
中图分类号
学科分类号
摘要
For multimedia applications, constructing a joint representation that could carry information for multiple modalities could be very conducive for downstream use cases. In this paper, we study how to effectively utilize available multimodal cues from videos in learning joint representations for the cross-modal video-text retrieval task. Existing hand-labeled video-text datasets are often very limited by their size considering the enormous amount of diversity the visual world contains. This makes it extremely difficult to develop a robust video-text retrieval system based on deep neural network models. In this regard, we propose a framework that simultaneously utilizes multimodal visual cues by a “mixture of experts” approach for retrieval. We conduct extensive experiments to verify that our system is able to boost the performance of the retrieval task compared to the state of the art. In addition, we propose a modified pairwise ranking loss function in training the embedding and study the effect of various loss functions. Experiments on two benchmark datasets show that our approach yields significant gain compared to the state of the art.
引用
收藏
页码:3 / 18
页数:15
相关论文
共 50 条
  • [41] An empirical study of excitation and aggregation design adaptions in CLIP4Clip for video-text retrieval
    Jing, Xiaolun
    Yang, Genke
    Chu, Jian
    NEUROCOMPUTING, 2024, 596
  • [42] CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval
    Zhuo, Yaoxin
    Li, Yikang
    Hsiao, Jenhao
    Ho, Chiuman
    Li, Baoxin
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 158 - 166
  • [43] Many Hands Make Light Work: Transferring Knowledge From Auxiliary Tasks for Video-Text Retrieval
    Wang, Wei
    Gao, Junyu
    Yang, Xiaoshan
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2661 - 2674
  • [44] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
    Ma, Yiwei
    Xu, Guohai
    Sun, Xiaoshuai
    Yan, Ming
    Zhang, Ji
    Ji, Rongrong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [45] Video-text retrieval via multi-modal masked transformer and adaptive attribute-aware graph convolutional network
    Lv, Gang
    Sun, Yining
    Nian, Fudong
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [46] Learning to Embed Semantic Similarity for Joint Image-Text Retrieval
    Malali, Noam
    Keller, Yosi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10252 - 10260
  • [47] Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval
    Yakovlev, Konstantin
    Polyakov, Gregory
    Alimova, Ilseyar
    Podolskiy, Alexander
    Bout, Andrey
    Nikolenko, Sergey
    Piontkovskaya, Irina
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2394 - 2398
  • [48] Reading-Strategy Inspired Visual Representation Learning for Text-to-Video Retrieval
    Dong, Jianfeng
    Wang, Yabing
    Chen, Xianke
    Qu, Xiaoye
    Li, Xirong
    He, Yuan
    Wang, Xun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5680 - 5694
  • [49] Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
    Mithun, Niluthpol Chowdhury
    Panda, Rameswar
    Papalexakis, Evangelos E.
    Roy-Chowdhury, Amit K.
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1856 - 1864
  • [50] S2CA: Shared Concept Prototypes and Concept-level Alignment for text-video retrieval
    Li, Yuxiao
    Xin, Yu
    Qian, Jiangbo
    Dong, Yihong
    NEUROCOMPUTING, 2025, 614