Text-based Person Search via Multi-Granularity Embedding Learning

被引:0
作者
Wang, Chengji [1 ]
Luo, Zhiming [1 ]
Lin, Yaojin [2 ]
Li, Shaozi [1 ]
机构
[1] Xiamen Univ, Dept Artificial Intelligence, Xiamen, Peoples R China
[2] Minnan Normal Univ, Sch Comp Sci, Zhangzhou, Peoples R China
来源
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年
基金
中国博士后科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing text-based person search methods highly depend on exploring the corresponding relations between the regions of the image and the words in the sentence. However, these methods correlated image regions and words in the same semantic granularity. It 1) results in irrelevant corresponding relations between image and text, 2) causes an ambiguity embedding problem. In this study, we propose a novel multi-granularity embedding learning model for text-based person search. It generates multi-granularity embeddings of partial person bodies in a coarse-to-fine manner by revisiting the person image at different spatial scales. Specifically, we distill the partial knowledge from image scrips to guide the model to select the semantically relevant words from the text description. It can learn discriminative and modality-invariant visual-textual embeddings. In addition, we integrate the partial embeddings at each granularity and perform multi-granularity image-text matching. Extensive experiments validate the effectiveness of our method, which can achieve new state-of-the-art performance by the learned discriminative partial embeddings.
引用
收藏
页码:1068 / 1074
页数:7
相关论文
共 26 条
  • [1] Aggarwal Surbhi, 2020, P WACV
  • [2] [Anonymous], 2017, P CVPR
  • [3] Chen Dapeng, 2018, P ECCV
  • [4] Chen Tianlang, 2018, P WACV
  • [5] Fu Yang, 2018, P AAAI
  • [6] Gui Y.-Y., 2016, NEWZOO, V3, P522
  • [7] Howard Andrew G., 2017, arXiv
  • [8] Jing Ya, 2020, P AAAI
  • [9] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [10] Li Shuang, 2017, P ICCV