An Image-Text Matching Method for Multi-Modal Robots

被引：2

作者：

Zheng, Ke ^{[1
]}

Li, Zhou ^{[1
]}

机构：

[1] Hunan Biol & Electromech Polytech, Changsha, Peoples R China

来源：

JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING | 2024年 / 36卷 / 01期

关键词：

Image-Text Matching; Multi-View Matching; Transformer;

D O I：

10.4018/JOEUC.334701

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image -text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.

引用

页数：21

共 46 条

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].

Anderson, Peter ;

He, Xiaodong ;

Buehler, Chris ;

Teney, Damien ;

Johnson, Mark ;

Gould, Stephen ;

Zhang, Lei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086

[2]

[Anonymous], 2013, NeurIPS, DOI DOI 10.48550/ARXIV.1310.4546

[3] Research on the Influence Maximization Problem in Social Networks Based on the Multi-Functional Complex Networks Model [J].

Bin, Sheng ;

Sun, Gengxin .

JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING, 2022, 34 (03)

[4] Global Relation-Aware Attention Network for Image-Text Retrieval [J].

Cao, Jie ;

Qian, Shengsheng ;

Zhang, Huaiwen ;

Fang, Quan ;

Xu, Changsheng .

PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, :19-28

[5]

CHANG SK, 1981, COMPUTER, V14, P13, DOI [10.1109/C-M.1981.220243, 10.1109/C-M.1981.220241]

[6] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval [J].

Chen, Hui ;

Ding, Guiguang ;

Liu, Xudong ;

Lin, Zijia ;

Liu, Ji ;

Han, Jungong .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12652-12660

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].

Donahue, Jeff ;

Hendricks, Lisa Anne ;

Rohrbach, Marcus ;

Venugopalan, Subhashini ;

Guadarrama, Sergio ;

Saenko, Kate ;

Darrell, Trevor .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691

[9] Linking Image and Text with 2-Way Nets [J].

Eisenschtat, Aviv ;

Wolf, Lior .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1855-1865

[10]

Fartash F., 2018, BRIT MACH VIS C, P935

← 1 2 3 4 5 →