Unbiased Scene Graph Generation Using Predicate Similarities

被引：0

作者：

Matsui, Yusuke ^{[1
]}

Ohashi, Misaki ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Bunkyo Ku, Tokyo 1138656, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Task analysis; Knowledge transfer; Feature extraction; Visualization; Training; Computer vision; Transfer learning; Bioinformatics; Genomics; Classification algorithms; Scene classification; Scene graph; unbiased generation; predicate similarities; transfer learning; long-tailed distribution; SMOTE;

D O I：

10.1109/ACCESS.2024.3424230

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images. However, these applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions. In recent years, many studies have tackled this problem. In contrast, relatively few works have considered predicate similarities as a unique dataset feature which also leads to the biased prediction. Due to the feature, infrequent predicates (e.g., "parked on", "covered in") are easily misclassified as closely-related frequent predicates (e.g., "on", "in"). Utilizing predicate similarities, we propose a new classification scheme that branches the process to several fine-grained classifiers for similar predicate groups. The classifiers aim to capture the differences among similar predicates in detail. We also introduce the idea of transfer learning to enhance the features for the predicates which lack sufficient training samples to learn the descriptive representations. Our target here is to improve the average precision scores even for the instances with the tail predicators. The results of extensive experiments on the Visual Genome dataset show that the combination of our method and an existing debiasing approach greatly improves performance on tail predicates in challenging SGCls/SGDet tasks. Nonetheless, the overall performance of the proposed approach does not reach that of the current state of the art, so further analysis remains necessary as future work.

引用

页码：95507 / 95516

页数：10

共 46 条

[1]

[Anonymous], 2017, INT J COMPUT VISION, V123, P32

[2]

[Anonymous], 2013, NeurIPS, DOI DOI 10.48550/ARXIV.1310.4546

[3]

Azuma Y., 2023, P IEEE CVF INT C COM, P3644

[4] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[5] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[6] Knowledge-Embedded Routing Network for Scene Graph Generation [J].

Chen, Tianshui ;

Yu, Weihao ;

Chen, Riquan ;

Lin, Liang .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6156-6164

[7] Recovering the Unbiased Scene Graphs from the Biased Ones [J].

Chiou, Meng-Jiun ;

Ding, Henghui ;

Yan, Hanshu ;

Wang, Changhu ;

Zimmermann, Roger ;

Feng, Jiashi .

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1581-1590

[8] Class-Balanced Loss Based on Effective Number of Samples [J].

Cui, Yin ;

Jia, Menglin ;

Lin, Tsung-Yi ;

Song, Yang ;

Belongie, Serge .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9260-9269

[9] Visual Relationships as Functions: Enabling Few-Shot Scene Graph Prediction [J].

Dornadula, Apoorva ;

Narcomey, Austin ;

Krishna, Ranjay ;

Bernstein, Michael ;

Li Fei-Fei .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1730-1739

[10]

Drummond R. C., 2003, P INT C MACH LEARN I, P1

← 1 2 3 4 5 →