Dual Semantic Relationship Attention Network for Image-Text Matching

被引:0
作者
Wen, Keyu [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
来源
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年
基金
中国国家自然科学基金;
关键词
cross-modal; retrieval; attention; semantic relationship;
D O I
10.1109/ijcnn48605.2020.9206782
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-Text Matching is one major task in cross-modal information processing. The main challenge is to learn the unified vision and language representations. Previous methods that perform well on this task primarily focus on the region features in images corresponding to the words in sentences. However, this will cause the regional features to lose contact with the global context, leading to the mismatch with those non-object words in some sentences. In this work, in order to alleviate this problem, a novel Dual Semantic Relationship Attention Network is proposed which mainly consists of two modules, separate semantic relationship module and the joint semantic relationship module. With these two modules, different hierarchies of semantic relationships are learned simultaneously, thus promoting the image-text matching process. Quantitative experiments have been performed on MS-COCO and Flickr-30K and our method outperforms previous approaches by a large margin due to the effectiveness of the dual semantic relationship attention scheme.
引用
收藏
页数:7
相关论文
共 36 条
  • [21] Matsubara T., 2019, ARXIV191006514
  • [22] Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
    Plummer, Bryan A.
    Wang, Liwei
    Cervantes, Chris M.
    Caicedo, Juan C.
    Hockenmaier, Julia
    Lazebnik, Svetlana
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2641 - 2649
  • [23] Radford L, 2018, ICME-13 MONOGR, P3, DOI 10.1007/978-3-319-68351-5_1
  • [24] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
    Ren, Shaoqing
    He, Kaiming
    Girshick, Ross
    Sun, Jian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) : 1137 - 1149
  • [25] Schuster S., 2015, P 4 WORKSH VIS LANG, P70
  • [26] Simonyan K, 2015, Arxiv, DOI [arXiv:1409.1556, DOI 10.48550/ARXIV.1409.1556]
  • [27] Vaswani A., 2017, Advances in neural information processing systems, P6000, DOI DOI 10.48550/ARXIV.1706.03762
  • [28] Velickovic Petar, 2017, STAT, P1, DOI DOI 10.48550/ARXIV.1710.10903
  • [29] Adversarial Cross-Modal Retrieval
    Wang, Bokun
    Yang, Yang
    Xu, Xing
    Hanjalic, Alan
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 154 - 162
  • [30] Wang S., 2019, ARXIV191005134