IS-GGT: Iterative Scene Graph Generation with Generative Transformers

被引：12

作者：

Kundu, Sanjoy ^{[1
]}

Aakur, Sathyanarayanan N. ^{[1
]}

机构：

[1] Oklahoma State Univ, Dept Comp Sci, Stillwater, OK 74078 USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/CVPR52729.2023.00609

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering, captioning, and even object detection, to name a few. Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene, which adds computational overhead to the approach. This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction. Using two transformer-based components, we first sample a possible scene graph structure from detected objects and their visual features. We then perform predicate classification on the sampled edges to generate the final scene graph. This approach allows us to efficiently generate scene graphs from images with minimal inference overhead. Extensive experiments on the Visual Genome dataset demonstrate the efficiency of the proposed approach. Without bells and whistles, we obtain, on average, 20.7% mean recall (mR@100) across different settings for scene graph generation (SGG), outperforming state-of-the-art SGG approaches while offering competitive performance to unbiased SGG approaches.

引用

页码：6292 / 6301

页数：10

共 42 条

[1] Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization [J].

Aakur, Sathyanarayanan ;

de Souza, Fillipe D. M. ;

Sarkar, Sudeep .

2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :190-199

[2] Knowledge guided learning: Open world egocentric action recognition with zero supervision [J].

Aakur, Sathyanarayanan N. ;

Kundu, Sanjoy ;

Gunti, Nikhil .

PATTERN RECOGNITION LETTERS, 2022, 156 :38-45

[3]

[Anonymous], P IEEE CVF C COMP VI

[4] Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration [J].

Bal, Aditi Basu ;

Mounir, Ramy ;

Aakur, Sathyanarayanan ;

Sarkar, Sudeep ;

Srivastava, Anuj .

COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 :440-456

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6] Self-Supervised GANs via Auxiliary Rotation Loss [J].

Chen, Ting ;

Zhai, Xiaohua ;

Ritter, Marvin ;

Lucic, Mario ;

Houlsby, Neil .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12146-12155

[7] Spatial-Temporal Transformer for Dynamic Scene Graph Generation [J].

Cong, Yuren ;

Liao, Wentong ;

Ackermann, Hanno ;

Rosenhahn, Bodo ;

Yang, Michael Ying .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :16352-16362

[8]

Cong Yuren, 2022, ARXIV220111460

[9]

github, FAST R CNN MOD PRETR

[10] Leaf senescence: progression, regulation, and application [J].

Guo, Yongfeng ;

Ren, Guodong ;

Zhang, Kewei ;

Li, Zhonghai ;

Miao, Ying ;

Guo, Hongwei .

MOLECULAR HORTICULTURE, 2021, 1 (01)

← 1 2 3 4 5 →