Video Action Classification through Graph Convolutional Networks

被引：2

作者：

Costa, Felipe F. ^{[1
]}

Saito, Priscila T. M. ^{[1
]}

Bugatti, Pedro H. ^{[1
]}

机构：

[1] Univ Tecnol Fed Parana, Dept Comp, 1640 Alberto Carazzai Ave, Cornelio Procopio, Brazil

来源：

VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 4: VISAPP | 2021年

关键词：

Deep Learning; Graph Convolutional Network; Computer Vision; Action Classification;

D O I：

10.5220/0010321304900497

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video classification methods have been evolving through proposals based on end-to-end deep learning architectures. Several works have testified that end-to-end models are effective for the learning of intrinsic video features, especially when compared to the handcrafted ones. In general, convolutional neural networks are used for deep learning in videos. Usually, when applied to such contexts, these vanilla deep learning networks cannot identify variations based on temporal information. To do so, memory-based cells (e.g. long-short term memory), or even optical flow techniques are used in conjunction with the convolutional process. However, despite their effectiveness, those methods neglect global analysis, processing only a small quantity of frames in each batch during the learning and inference process. Moreover, they also completely ignore the semantic relationship between different videos that belong to the same context. Thus, the present work aims to fill these gaps by using information grouping concepts and contextual detection through graph-based convolutional neural networks. The experiments show that our method achieves up to 87% of accuracy in a well-known public video dataset.

引用

页码：490 / 497

页数：8

共 25 条

[1] Long short-term memory
Hochreiter, S
Schmidhuber, J
[J]. NEURAL COMPUTATION, 1997, 9 (08) : 1735 - 1780
[2] [Anonymous], 2012, CoRR
[3] Baccouche M, 2010, LECT NOTES COMPUT SC, V6353, P154
[4] Dollar P., 2005, Proceedings. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS) (IEEE Cat. No. 05EX1178), P65
[5] Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[6] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[7] Duvenaudt D, 2015, ADV NEUR IN, V28
[8] Estrach J.B., 2014, C TRACK P, V2014
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] Henaff M., 2015, ABS150605163 CORR

← 1 2 3 →