Transductive Zero-Shot Action Recognition via Visually Connected Graph Convolutional Networks

被引:16
作者
Xu, Yangyang [1 ]
Han, Chu [2 ]
Qin, Jing [3 ]
Xu, Xuemiao [1 ,4 ,5 ,6 ]
Han, Guoqiang [1 ]
He, Shengfeng [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Guangdong Acad Med Sci, Guangdong Prov Peoples Hosp, Guangzhou 510080, Peoples R China
[3] Hong Kong Polytech Univ, Dept Nursing, Hong Kong, Peoples R China
[4] South China Univ Technol, Minist Educ, State Key Lab Subtrop Bldg Sci, Guangzhou 510006, Peoples R China
[5] South China Univ Technol, Key Lab Big Data & Intelligent Robot, Guangzhou 510006, Peoples R China
[6] South China Univ Technol, Guangdong Prov Key Lab Computat Intelligence & Cy, Guangzhou 510006, Peoples R China
关键词
Visualization; Feature extraction; Semantics; Correlation; Computational modeling; Learning systems; Explosives; Action recognition; graph convolutional network (GCN); zero-shot learning (ZSL);
D O I
10.1109/TNNLS.2020.3015848
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the explosive growth of action categories, zero-shot action recognition aims to extend a well-trained model to novel/unseen classes. To bridge the large knowledge gap between seen and unseen classes, in this brief, we visually associate unseen actions with seen categories in a visually connected graph, and the knowledge is then transferred from the visual features space to semantic space via the grouped attention graph convolutional networks (GAGCNs). In particular, we extract visual features for all the actions, and a visually connected graph is built to attach seen actions to visually similar unseen categories. Moreover, the proposed grouped attention mechanism exploits the hierarchical knowledge in the graph so that the GAGCN enables propagating the visual-semantic connections from seen actions to unseen ones. We extensively evaluate the proposed method on three data sets: HMDB51, UCF101, and NTU RGB + D. Experimental results show that the GAGCN outperforms state-of-the-art methods.
引用
收藏
页码:3761 / 3769
页数:9
相关论文
共 54 条
[1]  
Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
[2]  
[Anonymous], 2012, UCF101 DATASET 101 H
[3]   Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions [J].
Ba, Jimmy Lei ;
Swersky, Kevin ;
Fidler, Sanja ;
Salakhutdinov, Ruslan .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4247-4255
[4]   Random Forest Classifier for Zero-Shot Learning Based on Relative Attribute [J].
Cheng, Yuhu ;
Qiao, Xue ;
Wang, Xuesong ;
Yu, Qiang .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) :1662-1674
[5]   Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention [J].
Dat Huynh ;
Elhamifar, Ehsan .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4482-4492
[6]   Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning [J].
Demirel, Berkan ;
Cinbis, Ramazan Gokberk ;
Ikizler-Cinbis, Nazli .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1241-1250
[7]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[8]   Transductive Multi-View Zero-Shot Learning [J].
Fu, Yanwei ;
Hospedales, Timothy M. ;
Xiang, Tao ;
Gong, Shaogang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (11) :2332-2345
[9]   Learning Multimodal Latent Attributes [J].
Fu, Yanwei ;
Hospedales, Timothy M. ;
Xiang, Tao ;
Gong, Shaogang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (02) :303-316
[10]  
Gan C, 2015, AAAI CONF ARTIF INTE, P3769