Constrained Structure Learning for Scene Graph Generation

被引:4
作者
Liu, Daqi [1 ]
Bober, Miroslaw [1 ]
Kittler, Josef [1 ]
机构
[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, England
基金
英国工程与自然科学研究理事会;
关键词
Scene graph generation; structured prediction; mean field variational Bayesian; message passing; constrained optimization;
D O I
10.1109/TPAMI.2023.3282889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, such formulation fails to explore other inference strategies, and largely ignores the more general constrained optimization models. In this paper, we present a constrained structure learning method, for which an explicit constrained variational inference objective is proposed. Instead of applying the ubiquitous message-passing strategy, a generic constrained optimization method - entropic mirror descent - is utilized to solve the constrained variational inference step. We validate the proposed generic model on various popular scene graph generation benchmarks and show that it outperforms the state-of-the-art methods.
引用
收藏
页码:11588 / 11599
页数:12
相关论文
共 58 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] Mirror descent and nonlinear projected subgradient methods for convex optimization
    Beck, A
    Teboulle, M
    [J]. OPERATIONS RESEARCH LETTERS, 2003, 31 (03) : 167 - 175
  • [3] Belanger D, 2016, PR MACH LEARN RES, V48
  • [4] Soft Transfer Learning via Gradient Diagnosis for Visual Relationship Detection
    Chen, Diqi
    Liang, Xiaodan
    Wang, Yizhou
    Gao, Wen
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1118 - 1126
  • [5] Knowledge-Embedded Routing Network for Scene Graph Generation
    Chen, Tianshui
    Yu, Weihao
    Chen, Riquan
    Lin, Liang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6156 - 6164
  • [6] Cong YR, 2022, Arxiv, DOI arXiv:2201.11460
  • [7] Detecting Visual Relationships with Deep Relational Networks
    Dai, Bo
    Zhang, Yuqi
    Lin, Dahua
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3298 - 3308
  • [8] ITERATION METHODS FOR CONVEXLY CONSTRAINED ILL-POSED PROBLEMS IN HILBERT-SPACE
    EICKE, B
    [J]. NUMERICAL FUNCTIONAL ANALYSIS AND OPTIMIZATION, 1992, 13 (5-6) : 413 - 429
  • [9] A tutorial on variational Bayesian inference
    Fox, Charles W.
    Roberts, Stephen J.
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2012, 38 (02) : 85 - 95
  • [10] Gilmer J, 2017, PR MACH LEARN RES, V70