Improving Text-based Person Search by Spatial Matching and Adaptive Threshold

被引：86

作者：

Chen, Tianlang ^{[1
]}

Xu, Chenliang ^{[1
]}

Luo, Jiebo ^{[1
]}

机构：

[1] Univ Rochester, Rochester, NY 14627 USA

来源：

2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018) | 2018年

关键词：

D O I：

10.1109/WACV.2018.00208

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As an important complement to person re-identification, text-based person search in large-scale database is concerned greatly for person search applications. Given language description of a person, existing frameworks search the images in the dataset that describe the same person, by computing the affinity score between the description and each image. In this paper, we first propose an efficient patch-word matching model, which can accurately capture the local matching details between image and text. In particular, it computes the affinity between an image and a word as the affinity of the best matching patch of the image toward the word. Compared with the state-of-the-art framework, it achieves competitive performance, but yields low-complexity structure. In addition, we put forward a significant limitation of affinity-based model, it is overly sensitive to the matching degree of a corresponding image-word pair. For this limitation, we feed a creative adaptive threshold mechanism into the model, it automatically learns an adaptive threshold for each word, and effectively "compress" the affinity score between a word and an image when the score exceeds the word's threshold. Extensive experiments on the benchmark dataset demonstrate the effectiveness of the proposed framework, which outperforms other approaches for text-based person search. To provide a deeper insight into the proposed model, we visualize the matching details between spatial patches of images and words of texts on typical examples, and illustrate how adaptive threshold mechanism compresses the affinity score and benefits the final rank of different images toward a text description.

引用

页码：1879 / 1887

页数：9

共 20 条

[1]

[Anonymous], 2017, ARXIV170307220

[2]

[Anonymous], Simple baseline for visual question answering

[3]

[Anonymous], 2017, P CVPR

[4]

[Anonymous], 2016, ARXIV161105244

[5]

[Anonymous], 2015, COMPUTER SCI

[6] VQA: Visual Question Answering [J].

Antol, Stanislaw ;

Agrawal, Aishwarya ;

Lu, Jiasen ;

Mitchell, Margaret ;

Batra, Dhruv ;

Zitnick, C. Lawrence ;

Parikh, Devi .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433

[7]

Chen Q, 2015, PROC CVPR IEEE, P5315, DOI 10.1109/CVPR.2015.7299169

[8]

Klein E, 2015, PROC CVPR IEEE, P4437, DOI 10.1109/CVPR.2015.7299073

[9]

Li S., 2017, P INT C COMP VIS ICC

[10]

Li Y., 2017, P IEEE C COMP VIS PA

← 1 2 →