Improving Action Recognition via Temporal and Complementary Learning

被引:6
作者
Elmadany, Nour Eldin [1 ,2 ]
He, Yifeng [3 ,4 ]
Guan, Ling [1 ]
机构
[1] Ryerson Univ, Dept Elect Comp & Biomed Engn, 350 Victoria St, Toronto, ON M5B 2K3, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] Ryerson Univ, Toronto, ON, Canada
[4] 117 Micmac Cres, N York, ON M2H 2K1, Canada
关键词
Deep ConvNets; two-stream networks; HISTOGRAMS; ATTENTION; FLOW;
D O I
10.1145/3447686
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we study the problem of video-based action recognition. We improve the action recognition performance by finding an effective temporal and appearance representation. For capturing the temporal representation, we introduce two temporal learning techniques for improving long-term temporal information modeling, specifically Temporal Relational Network and Temporal Second-Order Pooling-based Network. Moreover, we harness the representation using complementary learning techniques, specifically Global-Local Network and Fuse-Inception Network. Performance evaluation on three datasets (UCF101, HMDB-51, and Mini-Kinetics-200) demonstrated the superiority of the proposed framework compared to the 2D Deep ConvNets-based state-of-the-art techniques.
引用
收藏
页数:24
相关论文
共 80 条
[31]   Better exploiting motion for better action recognition [J].
Jain, Mihir ;
Jegou, Herve ;
Bouthemy, Patrick .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2555-2562
[32]  
Jégou H, 2010, PROC CVPR IEEE, P3304, DOI 10.1109/CVPR.2010.5540039
[33]   Large-scale Video Classification with Convolutional Neural Networks [J].
Karpathy, Andrej ;
Toderici, George ;
Shetty, Sanketh ;
Leung, Thomas ;
Sukthankar, Rahul ;
Fei-Fei, Li .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :1725-1732
[34]  
Klaser A., 2015, P BRIT MACH VIS C BM
[35]  
Kuehne H, 2011, IEEE I CONF COMP VIS, P2556, DOI 10.1109/ICCV.2011.6126543
[36]   On space-time interest points [J].
Laptev, I .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2005, 64 (2-3) :107-123
[37]  
Laptev I, 2006, LECT NOTES COMPUT SC, V3667, P91
[38]   Recurrent Tubelet Proposal and Recognition Networks for Action Detection [J].
Li, Dong ;
Qiu, Zhaofan ;
Dai, Qi ;
Yao, Ting ;
Mei, Tao .
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 :306-322
[39]  
Li YH, 2019, AAAI CONF ARTIF INTE, P8674
[40]   3D Human Action Recognition Using a Single Depth Feature and Locality-Constrained Affine Subspace Coding [J].
Liang, Chengwu ;
Qi, Lin ;
He, Yifeng ;
Guan, Ling .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) :2920-2932