A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching

被引：4

作者：

Shang, Heng ^{[1
]}

Zhao, Guoshuai ^{[1
]}

Shi, Jing ^{[1
]}

Qian, Xueming ^{[2
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China

[2] Xi An Jiao Tong Univ, SMILES Lab, Xian 710049, Peoples R China

来源：

IEEE INTELLIGENT SYSTEMS | 2023年 / 38卷 / 03期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Feature extraction; Semantics; Text mining; Intelligent systems; Image representation; Task analysis; Image edge detection;

D O I：

10.1109/MIS.2023.3265176

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In image-text matching fields, one of the keys to improving performance is to extract features with more semantic information. Existing works demonstrate that semantic enrichment through knowledge expansion can improve performance. Most of them expand image features, however, the shortage of semantic information in text modality and the unilateral character of the view are often bottlenecks that limit the performance of image-text matching models. To solve the two problems, we aggregate knowledge from multiple views and propose a word imagination graph (WIG). A WIG can be used to expand textual semantic information by imagination based on input images. Then, utilizing WIG, we construct a novel multiview text imagination network (MTIN). A MTIN enables latent alignment of images and texts on tags, which can assist matching on a semantic level. Results from the Flickr30K and MS-COCO datasets demonstrate the effectiveness of our method. The source code has been released on GitHub https://github.com/smileslabsh/Multiview-Text-Imagination-Network.

引用

页码：41 / 50

页数：10

共 50 条

[31] PFAN plus plus : Bi-Directional Image-Text Retrieval With Position Focused Attention Network
Wang, Yaxiong
Yang, Hao
Bai, Xiuxiu
Qian, Xueming
Ma, Lin
Lu, Jing
Li, Biao
Fan, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 (23) : 3362 - 3376
[32] Image-Text Retrieval With Cross-Modal Semantic Importance Consistency
Liu, Zejun
Chen, Fanglin
Xu, Jun
Pei, Wenjie
Lu, Guangming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2465 - 2476
[33] Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation
Lei, Sen
Xiao, Xinyu
Zhang, Tianlin
Li, Heng-Chao
Shi, Zhenwei
Zhu, Qing
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[34] An Image-Text Dual-Channel Union Network for Person Re-Identification
Qi, Baoguang
Chen, Yi
Liu, Qiang
He, Xiaohai
Qing, Linbo
Sheriff, Ray E.
Chen, Honggang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72 : 1 - 16
[35] Visual Global-Salient-Guided Network for Remote Sensing Image-Text Retrieval
He, Yangpeng
Xu, Xin
Chen, Hongjia
Li, Jinwen
Pu, Fangling
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[36] Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval
Li, Jiangtong
Liu, Liu
Niu, Li
Zhang, Liqing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 9193 - 9207
[37] Global-Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image-Text Retrieval
Hu, Gang
Wen, Zaidao
Lv, Yafei
Zhang, Jianting
Wu, Qian
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[38] Causal Inference for Leveraging Image-Text Matching Bias in Multi-Modal Fake News Detection
Hu, Linmei
Chen, Ziwei
Zhao, Ziwang
Yin, Jianhua
Nie, Liqiang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11141 - 11152
[39] Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval
Yang, Song
Li, Qiang
Li, Wenhui
Li, Xuanya
Liu, An-An
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 8037 - 8050
[40] Feature First: Advancing Image-Text Retrieval Through Improved Visual Features
Wu, Dongqing
Li, Huihui
Gu, Cang
Liu, Hang
Xu, Cuili
Hou, Yinxuan
Guo, Lei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3827 - 3841

← 1 2 3 4 5 →