Detecting Visual Relationships Using Box Attention

被引:35
作者
Kolesnikov, Alexander [1 ,2 ]
Kuznetsova, Alina [1 ]
Lampert, Christoph H. [2 ]
Ferrari, Vittorio [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] IST Austria, Klosterneuburg, Austria
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW) | 2019年
关键词
D O I
10.1109/ICCVW.2019.00217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new modelfor detecting visual relationships, such as "person riding motorcycle" or "bottle on table". This task is an important step towards comprehensive structured image understanding, going beyond detecting individual objects. Our main novelty is a Box Attention mechanism that allows to model pairwise interactions between objects using standard object detection pipelines. The resulting model is conceptually clean, expressive and relies on welljustified training and prediction procedures. Moreover, unlike previously proposed approaches, our model does not introduce any additional complex components or hyperparameters on top of those already required by the underlying detection model. We conduct an experimental evaluation on two datasets, V-COCO and Open Images, demonstrating strong quantitative and qualitative results.
引用
收藏
页码:1749 / 1753
页数:5
相关论文
共 26 条
  • [21] Prest A., 2012, TPAMI
  • [22] Sadeghi MA, 2011, PROC CVPR IEEE, P1745, DOI 10.1109/CVPR.2011.5995711
  • [23] Rethinking the Inception Architecture for Computer Vision
    Szegedy, Christian
    Vanhoucke, Vincent
    Ioffe, Sergey
    Shlens, Jon
    Wojna, Zbigniew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2818 - 2826
  • [24] Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation
    Yu, Ruichi
    Li, Ang
    Morariu, Vlad I.
    Davis, Larry S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1068 - 1076
  • [25] Zhang H., 2017, P IEEE CVF C COMP VI, P5532, DOI [10.1109/CVPR.2017.331, DOI 10.1109/CVPR.2017.331]
  • [26] StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
    Zhang, Han
    Xu, Tao
    Li, Hongsheng
    Zhang, Shaoting
    Wang, Xiaogang
    Huang, Xiaolei
    Metaxas, Dimitris
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5908 - 5916