Counterfactual Critic Multi-Agent Training for Scene Graph Generation

被引:112
作者
Chen, Long [1 ]
Zhang, Hanwang [2 ]
Xiao, Jun [1 ]
He, Xiangnan [3 ]
Pu, Shiliang [4 ]
Chang, Shih-Fu [5 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, DCD Lab, Hangzhou, Peoples R China
[2] Nanyang Technol Univ, MReal Lab, Singapore, Singapore
[3] Univ Sci & Technol China, Hefei, Peoples R China
[4] Hikvis Res Inst, Hangzhou, Peoples R China
[5] Columbia Univ, DVMM Lab, New York, NY 10027 USA
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
D O I
10.1109/ICCV.2019.00471
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene graphs - objects as nodes and visual relationships as edges - describe the whereabouts and interactions of objects in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects. For example, "person" on "bike" can help to determine the relationship "ride", which in turn contributes to the confidence of the two objects. However, we argue that the visual context is not properly learned by using the prevailing cross-entropy based supervised learning paradigm, which is not sensitive to graph inconsistency: errors at the hub or non-hub nodes should not be penalized equally. To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach. CMAT is a multi-agent policy gradient method that frames objects into cooperative agents, and then directly maximizes a graph-level metric as the reward. In particular, to assign the reward properly to each agent, CMAT uses a counterfactual baseline that disentangles the agent-specific reward by fixing the predictions of other agents. Extensive validations on the challenging Visual Genome benchmark show that CMAT achieves a state-of-the-art performance by significant gains under various settings and metrics.
引用
收藏
页码:4612 / 4622
页数:11
相关论文
共 80 条
  • [1] [Anonymous], 2015, MULTIAGENT COOPERATI
  • [2] [Anonymous], 2018, ECCV
  • [3] [Anonymous], 2016, ECCV
  • [4] [Anonymous], 2017, IJCV
  • [5] [Anonymous], 2017, CVPR
  • [6] [Anonymous], 2017, INT C LEARNING REPRE
  • [7] [Anonymous], 2017, ICCV
  • [8] [Anonymous], 2018, NEURIPS
  • [9] [Anonymous], 2017, CVPR
  • [10] [Anonymous], 2016, ICLR