Quaternion Relation Embedding for Scene Graph Generation

被引：17

作者：

Wang, Zheng ^{[1
,2
,3
]}

Xu, Xing ^{[1
,2
]}

Wang, Guoqing ^{[1
,2
]}

Yang, Yang ^{[1
,2
]}

Shen, Heng Tao ^{[1
,2
,4
]}

机构：

[1] Univ Elect Sci & Technol China, Ctr Future Multimedia, Chengdu 611731, Peoples R China

[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[3] UESTC, Inst Elect & Informat Engn, Chengdu 523808, Guangdong, Peoples R China

[4] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Scene graph generation; interaction modeling; quaternion space; hamilton product; visual relation detection;

D O I：

10.1109/TMM.2023.3239229

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As an important visual understanding task, scene graph generation has been drawing widespread attention and could boost a broad range of downstream vision applications. Traditional scene graph generation methods based on different context refinements are trained with probabilistic chain rule, which treats objects and relationships as independent entities. Despite their surprisingly great progress, such a plain formulation unconsciously ignores the latent geometric structure of entities and relationships. To address this issue, we move beyond the traditional real-valued representations and use Quaternion Relation Embedding (QuatRE) to generate scene graphs with more expressive hypercomplex representations. More specifically, we introduce the concept of quaternion representations, hyper-complex valued with three imaginary components for objects entities, then formulate the relation triplets with Hamilton product. Benefiting from explicitly modeling the latent inter-dependencies among all imaginary components and strong expressive capacity, our proposed QuatRE method could better capture the interactions between entities. More importantly, our novel QuatRE method can be treated as a plug-in and well generalized into other methods for performance improvement as it involves no additional layers. Finally, extensive comparisons of our proposed method against the state-of-the-art methods on two large-scale and widely-used datasets, i.e. Visual Genome and Open Images, demonstrated our superiority and generalization capability on various metrics for biased or unbiased inference.

引用

页码：8646 / 8656

页数：11

共 61 条

[1]

Arjovsky M, 2016, PR MACH LEARN RES, V48

[2]

Bin Y, 2019, AAAI CONF ARTIF INTE, P8110

[3]

Bordes A., 2013, P 26 INT C NEUR INF, V2, P2787

[4] Knowledge-Embedded Routing Network for Scene Graph Generation [J].

Chen, Tianshui ;

Yu, Weihao ;

Chen, Riquan ;

Lin, Liang .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :6156-6164

[5] Neural Cryptography Based on Complex-Valued Neural Network [J].

Dong, Tao ;

Huang, Tingwen .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) :4999-5004

[6]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[7] On the Imaginary Wings: Text-Assisted Complex-Valued Fusion Network for Fine-Grained Visual Classification [J].

Guan, Xiang ;

Yang, Yang ;

Li, Jingjing ;

Zhu, Xiaofeng ;

Song, Jingkuan ;

Shen, Heng Tao .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) :5112-5121

[8] A Universal Quaternion Hypergraph Network for Multimodal Video Question Answering [J].

Guo, Zhicheng ;

Zhao, Jiaxuan ;

Jiao, Licheng ;

Liu, Xu ;

Liu, Fang .

IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :38-49

[9]

He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[10]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

← 1 2 3 4 5 6 7 →