Constrained Structure Learning for Scene Graph Generation

被引:6
作者
Liu, Daqi [1 ]
Bober, Miroslaw [1 ]
Kittler, Josef [1 ]
机构
[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, England
基金
英国工程与自然科学研究理事会;
关键词
Scene graph generation; structured prediction; mean field variational Bayesian; message passing; constrained optimization;
D O I
10.1109/TPAMI.2023.3282889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, such formulation fails to explore other inference strategies, and largely ignores the more general constrained optimization models. In this paper, we present a constrained structure learning method, for which an explicit constrained variational inference objective is proposed. Instead of applying the ubiquitous message-passing strategy, a generic constrained optimization method - entropic mirror descent - is utilized to solve the constrained variational inference step. We validate the proposed generic model on various popular scene graph generation benchmarks and show that it outperforms the state-of-the-art methods.
引用
收藏
页码:11588 / 11599
页数:12
相关论文
共 58 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]   Mirror descent and nonlinear projected subgradient methods for convex optimization [J].
Beck, A ;
Teboulle, M .
OPERATIONS RESEARCH LETTERS, 2003, 31 (03) :167-175
[3]  
Belanger D, 2016, PR MACH LEARN RES, V48
[4]   Soft Transfer Learning via Gradient Diagnosis for Visual Relationship Detection [J].
Chen, Diqi ;
Liang, Xiaodan ;
Wang, Yizhou ;
Gao, Wen .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1118-1126
[5]   Knowledge-Embedded Routing Network for Scene Graph Generation [J].
Chen, Tianshui ;
Yu, Weihao ;
Chen, Riquan ;
Lin, Liang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6156-6164
[6]  
Cong YR, 2022, Arxiv, DOI arXiv:2201.11460
[7]   Detecting Visual Relationships with Deep Relational Networks [J].
Dai, Bo ;
Zhang, Yuqi ;
Lin, Dahua .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3298-3308
[8]   ITERATION METHODS FOR CONVEXLY CONSTRAINED ILL-POSED PROBLEMS IN HILBERT-SPACE [J].
EICKE, B .
NUMERICAL FUNCTIONAL ANALYSIS AND OPTIMIZATION, 1992, 13 (5-6) :413-429
[9]   A tutorial on variational Bayesian inference [J].
Fox, Charles W. ;
Roberts, Stephen J. .
ARTIFICIAL INTELLIGENCE REVIEW, 2012, 38 (02) :85-95
[10]  
Gilmer J, 2017, PR MACH LEARN RES, V70