Batch Entropy Supervised Convolutional Neural Networks for Feature Extraction and Harmonizing for Action Recognition

被引：9

作者：

Hossain, Md Imtiaz ^{[1
]}

Siddique, Ashraf ^{[1
]}

Hossain, Md Alamgir ^{[1
]}

Hossain, Md Delowar ^{[1
]}

Huh, Eui-Nam ^{[1
]}

机构：

[1] Kyung Hee Univ, Dept Comp Sci & Engn, Yongin 17104, South Korea

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Feature extraction; Entropy; Videos; Training; Skeleton; Histograms; Convolutional neural networks; BESS; HAFS; batch-entropy; augmentation; action recognition; feature fusion; harmonization; LSTM; MODEL;

D O I：

10.1109/ACCESS.2020.3037529

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning-based action recognition in videos has obtained much attention because of achieving remarkable performance in diverse applications. However, due to the heterogeneous background and noisy spatio-temporal cues, extracting highly discriminative features is still quite challenging. To deal with this problem, numerous methods have been published based on the attention mechanism and skeleton modality. Instead of focusing on data pre-processing, we shed light on the feature map and concentrate on extracting highly discriminative features. First, we introduce Batch-wise Entropy Supervised Stream (BESS) to extend feature discrimination similar to the uncertainty of the corresponding batch. Secondly, to obtain a more generalized model, we propose a Stream to Harmonize the feature discrimination by Augmenting both Features (HAFS) of ResNext101 and BESS. These two streams are hallucinated by the distillation and feature fusion technique effectively into HAFS. We introduce a new metric to assess the characteristics of the feature map. This metric depicts the relationship between the feature discrimination and recognition accuracy. Finally, we comprehensively evaluate our approach on two benchmark datasets, HMDB51 and UCF101. Experimental results demonstrate that, extending and then harmonizing the feature discrimination is one of the effective ways of generating highly discriminative features. Experimental outcomes indicate the superiority of our proposed technique over the existing state-of-the-art methods.

引用

页码：206427 / 206444

页数：18

共 77 条

[1]

Ahmad M., 2020, ARXIV200514236

[2]

[Anonymous], 2011, CVPR 2011, DOI DOI 10.1109/CVPR.2011.5995316

[3] 3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition [J].

Arif, Sheeraz ;

Wang, Jing ;

Ul Hassan, Tehseen ;

Fei, Zesong .

FUTURE INTERNET, 2019, 11 (02)

[4]

Baldwin JF, 1997, INT J INTELL SYST, V12, P523, DOI 10.1002/(SICI)1098-111X(199707)12:7<523::AID-INT3>3.0.CO

[5]

2-N

[6] Human Action Recognition: Pose-based Attention draws focus to Hands [J].

Baradel, Fabien ;

Wolf, Christian ;

Mille, Julien .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :604-613

[7]

BILEN H, 2016, PROC CVPR IEEE, P3034, DOI DOI 10.1109/CVPR.2016.331

[8] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[9]

Chang J, 2018, I C INF COMM TECH CO, P177, DOI 10.1109/ICTC.2018.8539530

[10] P-CNN: Pose-based CNN Features for Action Recognition [J].

Cheron, Guilhem ;

Laptev, Ivan ;

Schmid, Cordelia .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3218-3226

← 1 2 3 4 5 6 7 8 →