Batch Entropy Supervised Convolutional Neural Networks for Feature Extraction and Harmonizing for Action Recognition

被引：9

作者：

Hossain, Md Imtiaz ^{[1
]}

Siddique, Ashraf ^{[1
]}

Hossain, Md Alamgir ^{[1
]}

Hossain, Md Delowar ^{[1
]}

Huh, Eui-Nam ^{[1
]}

机构：

[1] Kyung Hee Univ, Dept Comp Sci & Engn, Yongin 17104, South Korea

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Feature extraction; Entropy; Videos; Training; Skeleton; Histograms; Convolutional neural networks; BESS; HAFS; batch-entropy; augmentation; action recognition; feature fusion; harmonization; LSTM; MODEL;

D O I：

10.1109/ACCESS.2020.3037529

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning-based action recognition in videos has obtained much attention because of achieving remarkable performance in diverse applications. However, due to the heterogeneous background and noisy spatio-temporal cues, extracting highly discriminative features is still quite challenging. To deal with this problem, numerous methods have been published based on the attention mechanism and skeleton modality. Instead of focusing on data pre-processing, we shed light on the feature map and concentrate on extracting highly discriminative features. First, we introduce Batch-wise Entropy Supervised Stream (BESS) to extend feature discrimination similar to the uncertainty of the corresponding batch. Secondly, to obtain a more generalized model, we propose a Stream to Harmonize the feature discrimination by Augmenting both Features (HAFS) of ResNext101 and BESS. These two streams are hallucinated by the distillation and feature fusion technique effectively into HAFS. We introduce a new metric to assess the characteristics of the feature map. This metric depicts the relationship between the feature discrimination and recognition accuracy. Finally, we comprehensively evaluate our approach on two benchmark datasets, HMDB51 and UCF101. Experimental results demonstrate that, extending and then harmonizing the feature discrimination is one of the effective ways of generating highly discriminative features. Experimental outcomes indicate the superiority of our proposed technique over the existing state-of-the-art methods.

引用

页码：206427 / 206444

页数：18

共 77 条

[51]

Simonyan Karen, 2014, ADV NEURAL INFORM PR, DOI DOI 10.1002/14651858.CD001941.PUB3

[52] Modality Compensation Network: Cross-Modal Adaptation for Action Recognition [J].

Song, Sijie ;

Liu, Jiaying ;

Li, Yanghao ;

Guo, Zongming .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :3957-3969

[53]

Soomro Khurram, 2012, CRCVTR1201

[54] D3D: Distilled 3D Networks for Video Action Recognition [J].

Stroud, Jonathan C. ;

Ross, David A. ;

Sun, Chen ;

Deng, Jia ;

Sukthankar, Rahul .

2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, :614-623

[55]

Su ZX, 2009, SMI 2009: IEEE INTERNATIONAL CONFERENCE ON SHAPE MODELING AND APPLICATIONS, PROCEEDINGS, P1, DOI 10.1109/SMI.2009.5170156

[56] Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition [J].

Sun, Shuyang ;

Kuang, Zhanghui ;

Sheng, Lu ;

Ouyang, Wanli ;

Zhang, Wei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1390-1399

[57]

Szegedy C, 2014, Arxiv, DOI [arXiv:1312.6199, DOI 10.1109/CVPR.2015.7298594]

[58] A Hybrid Deep Model Using Deep Learning and Dense Optical Flow Approaches for Human Activity Recognition [J].

Tanberk, Senem ;

Kilimci, Zeynep Hilal ;

Tukel, Dilek Bilgin ;

Uysal, Mitat ;

Akyokus, Selim .

IEEE ACCESS, 2020, 8 :19799-19809

[59]

Torrey L., 2010, Handbook of Research on Machine Learning Applications and Trends - Chapter 11: Transfer Learning, P242, DOI [DOI 10.4018/978-1-60566-766-9.CH011, DOI 10.4018/978-1-60566-766-9]

[60]

Vapnik V, 2015, J MACH LEARN RES, V16, P2023

← 1 2 3 4 5 6 7 8 →