An Image-Text Matching Method for Multi-Modal Robots

被引:2
作者
Zheng, Ke [1 ]
Li, Zhou [1 ]
机构
[1] Hunan Biol & Electromech Polytech, Changsha, Peoples R China
关键词
Image-Text Matching; Multi-View Matching; Transformer;
D O I
10.4018/JOEUC.334701
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image -text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.
引用
收藏
页数:21
相关论文
共 46 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]  
[Anonymous], 2013, NeurIPS, DOI DOI 10.48550/ARXIV.1310.4546
[3]   Research on the Influence Maximization Problem in Social Networks Based on the Multi-Functional Complex Networks Model [J].
Bin, Sheng ;
Sun, Gengxin .
JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING, 2022, 34 (03)
[4]   Global Relation-Aware Attention Network for Image-Text Retrieval [J].
Cao, Jie ;
Qian, Shengsheng ;
Zhang, Huaiwen ;
Fang, Quan ;
Xu, Changsheng .
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, :19-28
[5]  
CHANG SK, 1981, COMPUTER, V14, P13, DOI [10.1109/C-M.1981.220243, 10.1109/C-M.1981.220241]
[6]   IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval [J].
Chen, Hui ;
Ding, Guiguang ;
Liu, Xudong ;
Lin, Zijia ;
Liu, Ji ;
Han, Jungong .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12652-12660
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]   Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].
Donahue, Jeff ;
Hendricks, Lisa Anne ;
Rohrbach, Marcus ;
Venugopalan, Subhashini ;
Guadarrama, Sergio ;
Saenko, Kate ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691
[9]   Linking Image and Text with 2-Way Nets [J].
Eisenschtat, Aviv ;
Wolf, Lior .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1855-1865
[10]  
Fartash F., 2018, BRIT MACH VIS C, P935