Combination of temporal-channels correlation information and bilinear feature for action recognition

被引:12
作者
Cai, Jiahui [1 ]
Hu, Jianguo [2 ,3 ]
Li, Shiren [1 ]
Lin, Jialing [1 ]
Wang, Jun [2 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Peoples R China
[2] Sun Yat Sen Univ, Sch Microelect Sci & Technol, Zhuhai, Peoples R China
[3] Dev Res Inst Guangzhou Smart City, Guangzhou, Peoples R China
关键词
Classification (of information) - Convolution;
D O I
10.1049/iet-cvi.2020.0023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, the authors focus on improving the spatio-temporal representation ability of three-dimensional (3D) convolutional neural networks (CNNs) in the video domain. They observe two unfavourable issues: (i) the convolutional filters only dedicate to learning local representation along input channels. Also they treat channel-wise features equally, without emphasising the important features; (ii) traditional global average pooling layer only captures first-order statistics, ignoring finer detail features useful for classification. To mitigate these problems, they proposed two modules to boost 3D CNNs' performance, which are temporal-channel correlation (TCC) and bilinear pooling module. The TCC module can capture the information of inter-channel correlations over the temporal domain. Moreover, the TCC module generates channel-wise dependencies, which can adaptively re-weight the channel-wise features. Therefore, the network can focus on learning important features. With regards to the bilinear pooling module, it can capture more complex second-order statistics in deep features and generate a second-order classification vector. We can get more accurate classification results by combining the first-order and second-order classification vector. Extensive experiments show that adding our proposed modules to 130 network could consistently improve the performance and outperform the state-of-the-art methods. The code and models are available at https://github.com/ caijh33/13D_TCC_Bilinear.
引用
收藏
页码:634 / 641
页数:8
相关论文
共 38 条
[1]  
[Anonymous], ICML
[2]  
[Anonymous], 2018, P EUROPEAN C COMPUTE
[3]  
[Anonymous], 2017, NEW MODEL KINETICS D
[4]  
Chen YP, 2018, ADV NEUR IN, V31
[5]   PoTion: Pose MoTion Representation for Action Recognition [J].
Choutas, Vasileios ;
Weinzaepfel, Philippe ;
Revaud, Jerome ;
Schmid, Cordelia .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7024-7033
[6]   MARS: Motion-Augmented RGB Stream for Action Recognition [J].
Crasto, Nieves ;
Weinzaepfel, Philippe ;
Alahari, Karteek ;
Schmid, Cordelia .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7874-7883
[7]  
Diba A., 2018, P EUR C COMP VIS ECC
[8]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[9]   Compact Bilinear Pooling [J].
Gao, Yang ;
Beijbom, Oscar ;
Zhang, Ning ;
Darrell, Trevor .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :317-326
[10]  
Girdhar R, 2017, ADV NEUR IN, V30