UNSUPERVISED MOTION REPRESENTATION ENHANCED NETWORK FOR ACTION RECOGNITION

被引：2

作者：

Yang, Xiaohang ^{[1
]}

Kong, Lingtong ^{[1
]}

Yang, Jie ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Action recognition; video classification; optical flow; unsupervised learning; feature pyramid;

D O I：

10.1109/ICASSP39728.2021.9414222

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Learning reliable motion representation between consecutive frames, such as optical flow, has proven to have great promotion to video understanding. However, the TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow. To fill the gap, we propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator. UF-TSN estimates motion cues from adjacent frames in a coarse-to-fine manner and focuses on small displacement for each level by extracting pyramid of feature and warping one to the other according to the estimated flow of the last level. Due to the lack of labeled motion for action datasets, we constrain the flow prediction with multi-scale photometric consistency and edge-aware smoothness. Compared with state-of-the-art unsupervised motion representation learning methods, our model achieves better accuracy while maintaining efficiency, which is competitive with some supervised or more complicated approaches.

引用

页码：2445 / 2449

页数：5

共 25 条

[1]

[Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00630

[2]

[Anonymous], 2016, LECT NOTES COMP VIII

[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[4] FlowNet: Learning Optical Flow with Convolutional Networks [J].

Dosovitskiy, Alexey ;

Fischer, Philipp ;

Ilg, Eddy ;

Haeusser, Philip ;

Hazirbas, Caner ;

Golkov, Vladimir ;

van der Smagt, Patrick ;

Cremers, Daniel ;

Brox, Thomas .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766

[5] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[6] Convolutional Two-Stream Network Fusion for Video Action Recognition [J].

Feichtenhofer, Christoph ;

Pinz, Axel ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941

[7] UNSUPERVISED LEARNING FOR OPTICAL FLOW ESTIMATION USING PYRAMID CONVOLUTION LSTM [J].

Guan, Shuosen ;

Li, Haoxin ;

Zheng, Wei-Shi .

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, :181-186

[8] FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks [J].

Ilg, Eddy ;

Mayer, Nikolaus ;

Saikia, Tonmoy ;

Keuper, Margret ;

Dosovitskiy, Alexey ;

Brox, Thomas .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1647-1655

[9]

Jaderberg M, 2015, ADV NEUR IN, V28

[10]

Kong LT, 2020, IEEE IMAGE PROC, P1501, DOI 10.1109/ICIP40778.2020.9191101

← 1 2 3 →