Deep Multimodal Learning for Information Retrieval

被引:0
作者
Ji, Wei [1 ]
Wei, Yinwei [2 ]
Zheng, Zhedong [1 ]
Fei, Hao [1 ]
Chua, Tat-Seng [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Monash Univ, Clayton, Vic, Australia
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
Information retrieval; Multi-modal; CLIP;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information retrieval (IR) is a fundamental technique that aims to acquire information from a collection of documents, web pages, or other sources. While traditional text-based IR has achieved great success, the under-utilization of varied data sources in different modalities (i.e., text, images, audio, and video) would hinder IR techniques from giving its full advancement and thus limits the applications in the real world. Within recent years, the rapid development of deep multimodal learning paves the way for advancing IR with multi-modality. Benefiting from a variety of data types and modalities, some latest prevailing techniques are invented to show great facilitation in multi-modal and IR learning, such as CLIP, ChatGPT, GPT4, etc. In the context of IR, deep multi-modal learning has shown the prominent potential to improve the performance of retrieval systems, by enabling them to better understand and process the diverse types of data that they encounter. Given the great potential shown by multimodal-empowered IR, there can be still unsolved challenges and open questions in the related directions. With this workshop, we aim to provide a platform for discussion about multi-modal IR among scholars, practitioners, and other interested parties.
引用
收藏
页码:9739 / 9741
页数:3
相关论文
共 22 条
  • [1] Scalable Deep Hashing for Large-Scale Social Image Retrieval
    Cui, Hui
    Zhu, Lei
    Li, Jingjing
    Yang, Yang
    Nie, Liqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 1271 - 1284
  • [2] Balanced neural architecture search and optimization for specific emitter identification
    Du, Mingyang
    Zhong, Ping
    Cai, Xiaohao
    Bi, Daping
    Li, Zhifei
    [J]. 2022 IEEE 12TH INTERNATIONAL CONFERENCE ON RFID TECHNOLOGY AND APPLICATIONS (RFID-TA), 2022, : 220 - 223
  • [3] Du Yali, 2023, P ACM INT C WEB SEAR, DOI 10.1145/3539597.3570405
  • [4] VidVRD 2021: The Third Grand Challenge on Video Relation Detection
    Ji, Wei
    Li, Yicong
    Wei, Meng
    Shang, Xindi
    Xiao, Junbin
    Ren, Tongwei
    Chua, Tat-Seng
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4779 - 4783
  • [5] Human-Centric Clothing Segmentation via Deformable Semantic Locality-Preserving Network
    Ji, Wei
    Li, Xi
    Wu, Fei
    Pan, Zhijie
    Zhuang, Yueting
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4837 - 4848
  • [6] Ji Wei, 2022, ARXIV221213163
  • [7] Ji Wei, 2023, ARE BINARY ANNOTATIO
  • [8] What Aspect Do You Like: Multi-scale Time-aware User Interest Modeling for Micro-video Recommendation
    Jiang, Hao
    Wang, Wenjie
    Wei, Yinwei
    Gao, Zan
    Wang, Yinglong
    Nie, Liqiang
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3487 - 3495
  • [9] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
  • [10] Self-Supervised Correlation Learning for Cross-Modal Retrieval
    Liu, Yaxin
    Wu, Jianlong
    Qu, Leigang
    Gan, Tian
    Yin, Jianhua
    Nie, Liqiang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863