Applications of graph convolutional networks in computer vision

被引：40

作者：

Cao, Pingping ^{[1
]}

Zhu, Zeqi ^{[1
]}

Wang, Ziyuan ^{[1
]}

Zhu, Yanping ^{[2
]}

Niu, Qiang ^{[1
]}

机构：

[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221006, Jiangsu, Peoples R China

[2] Missouri Univ Sci & Technol, Dept Civil Architectural & Environm Engn, 500 W 16th St, Rolla, MO 65409 USA

来源：

NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 16期

基金：

中国国家自然科学基金;

关键词：

Graph convolution network; Non-Euclidean space; Relational modeling; Computer vision;

D O I：

10.1007/s00521-022-07368-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graph Convolutional Network (GCN) which models the potential relationship between non-Euclidean spatial data has attracted researchers' attention in deep learning in recent years. It has been widely used in different computer vision tasks by modeling the latent space, topology, semantics, and other information in Euclidean spatial data and has achieved significant success. To better understand the work principles and future GCN applications in the computer vision field, this study reviewed the basic principles of GCN, summarized the difficulties and solutions using GCN in different visual tasks, and introduced in detail the methods for constructing graphs from the Euclidean spatial data in different visual tasks. At the same time, the review divided the application of GCN in basic visual tasks into image recognition, object detection, semantic segmentation, instance segmentation and object tracking. The role and performance of GCN in basic visual tasks were summarized and compared in detail for different tasks. This review emphasizes that the application of GCN in computer vision faces three challenges: computational complexity, the paradigm of constructing graphs from the Euclidean spatial data, and the interpretability of the model. Finally, this review proposes two future trends of GCN in the vision field, namely model lightweight and fusing GCN with other models to improve the performance of the visual model and meet the higher requirements of vision tasks.

引用

页码：13387 / 13405

页数：19

共 108 条

[1]

Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911

[2]

[Anonymous], 2017, Journal of Computer Science

[3]

[Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.325

[4]

Beck D, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P273

[5] Learning a Neural Solver for Multiple Object Tracking [J].

Braso, Guillem ;

Leal-Taixe, Laura .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6246-6256

[6]

Bruna J., 2014, P ICLR

[7] Multi-label image recognition with two-stream dynamic graph convolution networks [J].

Cao, Pingping ;

Chen, Pengpeng ;

Niu, Qiang .

IMAGE AND VISION COMPUTING, 2021, 113

[8] Sequential Graph Convolutional Network for Active Learning [J].

Caramalau, Razvan ;

Bhattarai, Binod ;

Kim, Tae-Kyun .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9578-9587

[9] Annotating Object Instances with a Polygon-RNN [J].

Castrejon, Lluis ;

Kundu, Kaustav ;

Urtasun, Raquel ;

Fidler, Sanja .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4485-4493

[10]

Chami I, 2020, MACHINE LEARNING GRA

← 1 2 3 4 5 6 7 8 9 10 →