FEXNet: Foreground Extraction Network for Human Action Recognition

被引：29

作者：

Shen, Zhongwei ^{[1
]}

Wu, Xiao-Jun ^{[1
]}

Xu, Tianyang ^{[1
,2
]}

机构：

[1] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi 214122, Jiangsu, Peoples R China

[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks; Spatiotemporal phenomena; Feature extraction; Three-dimensional displays; Solid modeling; Iron; Image recognition; Foreground-related features; spatiotemporal modeling; action recognition;

D O I：

10.1109/TCSVT.2021.3103677

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method.

引用

页码：3141 / 3151

页数：11

共 50 条

[1] An Improved Action Recognition Network With Temporal Extraction and Feature Enhancement
Jiang, Jie
Zhang, Yi
IEEE ACCESS, 2022, 10 : 13926 - 13935
[2] Appearance-and-Dynamic Learning With Bifurcated Convolution Neural Network for Action Recognition
Zhang, Junxuan
Hu, Haifeng
Liu, Zheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1593 - 1606
[3] Human Action Recognition Based on Foreground Trajectory and Motion Difference Descriptors
Dong, Suge
Hu, Daidi
Li, Ruijun
Ge, Mingtao
APPLIED SCIENCES-BASEL, 2019, 9 (10):
[4] Collaborative and Multilevel Feature Selection Network for Action Recognition
Zheng, Zhenxing
An, Gaoyun
Cao, Shan
Wu, Dapeng
Ruan, Qiuqi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1304 - 1318
[5] Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network
Ding, Wenwen
Ding, Chongyang
Li, Guang
Liu, Kai
IEEE ACCESS, 2021, 9 : 54078 - 54089
[6] Realistic action recognition with salient foreground trajectories
Yi, Yang
Zheng, Zhenxian
Lin, Maoqing
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 75 : 44 - 55
[7] Saliency-based foreground trajectory extraction using multiscale hybrid masks for action recognition
Zhang, Guoliang
Jia, Songmin
Zhang, Xiangyin
Li, Xiuzhi
JOURNAL OF ELECTRONIC IMAGING, 2018, 27 (05)
[8] A spatiotemporal and motion information extraction network for action recognition
Wang, Wei
Wang, Xianmin
Zhou, Mingliang
Wei, Xuekai
Li, Jing
Ren, Xiaojun
Zong, Xuemei
WIRELESS NETWORKS, 2024, 30 (06) : 5389 - 5405
[9] Inter-Dimensional Correlations Aggregated Attention Network for Action Recognition
Li, Xiaochao
Zhan, Jianhao
Yang, Man
IEEE ACCESS, 2021, 9 (09): : 105965 - 105973
[10] Dual attention convolutional network for action recognition
Li, Xiaoqiang
Xie, Miao
Zhang, Yin
Ding, Guangtai
Tong, Weiqin
IET IMAGE PROCESSING, 2020, 14 (06) : 1059 - 1065

← 1 2 3 4 5 →