Deep Multimodal Learning for Information Retrieval

被引：0

作者：

Ji, Wei ^{[1
]}

Wei, Yinwei ^{[2
]}

Zheng, Zhedong ^{[1
]}

Fei, Hao ^{[1
]}

Chua, Tat-Seng ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] Monash Univ, Clayton, Vic, Australia

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

Information retrieval; Multi-modal; CLIP;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Information retrieval (IR) is a fundamental technique that aims to acquire information from a collection of documents, web pages, or other sources. While traditional text-based IR has achieved great success, the under-utilization of varied data sources in different modalities (i.e., text, images, audio, and video) would hinder IR techniques from giving its full advancement and thus limits the applications in the real world. Within recent years, the rapid development of deep multimodal learning paves the way for advancing IR with multi-modality. Benefiting from a variety of data types and modalities, some latest prevailing techniques are invented to show great facilitation in multi-modal and IR learning, such as CLIP, ChatGPT, GPT4, etc. In the context of IR, deep multi-modal learning has shown the prominent potential to improve the performance of retrieval systems, by enabling them to better understand and process the diverse types of data that they encounter. Given the great potential shown by multimodal-empowered IR, there can be still unsolved challenges and open questions in the related directions. With this workshop, we aim to provide a platform for discussion about multi-modal IR among scholars, practitioners, and other interested parties.

引用

页码：9739 / 9741

页数：3

共 22 条

[1] Scalable Deep Hashing for Large-Scale Social Image Retrieval
Cui, Hui
Zhu, Lei
Li, Jingjing
Yang, Yang
Nie, Liqiang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 1271 - 1284
[2] Balanced neural architecture search and optimization for specific emitter identification
Du, Mingyang
Zhong, Ping
Cai, Xiaohao
Bi, Daping
Li, Zhifei
[J]. 2022 IEEE 12TH INTERNATIONAL CONFERENCE ON RFID TECHNOLOGY AND APPLICATIONS (RFID-TA), 2022, : 220 - 223
[3] Du Yali, 2023, P ACM INT C WEB SEAR, DOI 10.1145/3539597.3570405
[4] VidVRD 2021: The Third Grand Challenge on Video Relation Detection
Ji, Wei
Li, Yicong
Wei, Meng
Shang, Xindi
Xiao, Junbin
Ren, Tongwei
Chua, Tat-Seng
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4779 - 4783
[5] Human-Centric Clothing Segmentation via Deformable Semantic Locality-Preserving Network
Ji, Wei
Li, Xi
Wu, Fei
Pan, Zhijie
Zhuang, Yueting
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4837 - 4848
[6] Ji Wei, 2022, ARXIV221213163
[7] Ji Wei, 2023, ARE BINARY ANNOTATIO
[8] What Aspect Do You Like: Multi-scale Time-aware User Interest Modeling for Micro-video Recommendation
Jiang, Hao
Wang, Wenjie
Wei, Yinwei
Gao, Zan
Wang, Yinglong
Nie, Liqiang
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3487 - 3495
[9] Visual Semantic Reasoning for Image-Text Matching
Li, Kunpeng
Zhang, Yulun
Li, Kai
Li, Yuanyuan
Fu, Yun
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
[10] Self-Supervised Correlation Learning for Cross-Modal Retrieval
Liu, Yaxin
Wu, Jianlong
Qu, Leigang
Gan, Tian
Yin, Jianhua
Nie, Liqiang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863

← 1 2 3 →