Attention-Based Video Disentangling and Matching Network for Zero-Shot Action Recognition

被引:0
作者
Su, Yong [1 ]
Zhu, Shuang [1 ]
Xing, Meng [1 ]
Xu, Hengpeng [1 ]
Li, Zhengtao [1 ]
机构
[1] Tianjin Normal Univ, Coll Elect & Commun Engn, Tianjin 300387, Peoples R China
来源
COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1 | 2022年 / 878卷
关键词
ZSAR; Attention-based; Disentangling; Relationship learning;
D O I
10.1007/978-981-19-0390-8_45
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Zero-Shot Action Recognition (ZSAR) is achieved by learning the mapping relationship between visual space and semantic space. Existing methods usually utilize the SOTA backbone network to construct the visual space. These methods have two major limitations. First, the human motion information that crucial for action recognition is easy to be confused in the background. Second, the key information which can reflect the correlation between actions may fall into oblivion, due to the redundancy in video sequences. In this paper, we propose an Attention-based Video Disentangling Matching Network (AVDMN) to solve the above problems. Specifically, we decompose segment-wise video into background stream and human motion stream by proposing a video disentangling mechanism. Furthermore, to further highlight the correlation between actions, we design an attention module to extract the key component of the above information. Finally, a relationship learning module is introduced to learn and measure the distance or similarity between video representation and action labels. Experiments on three realistic action benchmark Olympic Sports, HMDB51, and UCF101 show that the proposed architecture achieves favorable performance among ZSAR methods.
引用
收藏
页码:368 / 375
页数:8
相关论文
共 19 条
[1]   OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].
Cao, Zhe ;
Hidalgo, Gines ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186
[2]  
Cho K., 2014, COMPUT SCI
[3]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[4]  
Gao JY, 2019, AAAI CONF ARTIF INTE, P8303
[5]  
Kay W., 2017, arXiv
[6]   Unsupervised Domain Adaptation for Zero-Shot Learning [J].
Kodirov, Elyor ;
Xiang, Tao ;
Fu, Zhenyong ;
Gong, Shaogang .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2452-2460
[7]  
Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
[8]   Attribute-Based Classification for Zero-Shot Visual Object Categorization [J].
Lampert, Christoph H. ;
Nickisch, Hannes ;
Harmeling, Stefan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :453-465
[9]  
Liu Jingen., 2011, IEEE C COMPUTER VISI
[10]  
Niebles JC, 2010, LECT NOTES COMPUT SC, V6312, P392, DOI 10.1007/978-3-642-15552-9_29