Detecting Visual Relationships Using Box Attention

被引：35

作者：

Kolesnikov, Alexander ^{[1
,2
]}

Kuznetsova, Alina ^{[1
]}

Lampert, Christoph H. ^{[2
]}

Ferrari, Vittorio ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

[2] IST Austria, Klosterneuburg, Austria

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW) | 2019年

关键词：

D O I：

10.1109/ICCVW.2019.00217

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a new modelfor detecting visual relationships, such as "person riding motorcycle" or "bottle on table". This task is an important step towards comprehensive structured image understanding, going beyond detecting individual objects. Our main novelty is a Box Attention mechanism that allows to model pairwise interactions between objects using standard object detection pipelines. The resulting model is conceptually clean, expressive and relies on welljustified training and prediction procedures. Moreover, unlike previously proposed approaches, our model does not introduce any additional complex components or hyperparameters on top of those already required by the underlying detection model. We conduct an experimental evaluation on two datasets, V-COCO and Open Images, demonstrating strong quantitative and qualitative results.

引用

页码：1749 / 1753

页数：5

共 26 条

[21] Prest A., 2012, TPAMI
[22] Sadeghi MA, 2011, PROC CVPR IEEE, P1745, DOI 10.1109/CVPR.2011.5995711
[23] Rethinking the Inception Architecture for Computer Vision
Szegedy, Christian
Vanhoucke, Vincent
Ioffe, Sergey
Shlens, Jon
Wojna, Zbigniew
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2818 - 2826
[24] Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation
Yu, Ruichi
Li, Ang
Morariu, Vlad I.
Davis, Larry S.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1068 - 1076
[25] Zhang H., 2017, P IEEE CVF C COMP VI, P5532, DOI [10.1109/CVPR.2017.331, DOI 10.1109/CVPR.2017.331]
[26] StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Zhang, Han
Xu, Tao
Li, Hongsheng
Zhang, Shaoting
Wang, Xiaogang
Huang, Xiaolei
Metaxas, Dimitris
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5908 - 5916

← 1 2 3 →