Action Recognition in Dark Videos Using Spatio-Temporal Features and Bidirectional Encoder Representations from Transformers

被引:6
|
作者
Singh H. [1 ]
Suman S. [1 ]
Subudhi B.N. [1 ]
Jakhetiya V. [1 ]
Ghosh A. [2 ]
机构
[1] Indian Institute of Technology Jammu, Jammu
[2] Indian Statistical Institute Kolkata, Kolkata
来源
IEEE Transactions on Artificial Intelligence | 2023年 / 4卷 / 06期
关键词
Action recognition; dark video; image processing;
D O I
10.1109/TAI.2022.3221912
中图分类号
学科分类号
摘要
Several research works have been developed in the area of action recognition. Unfortunately, when these algorithms are applied to low-light or dark videos, their performances are highly affected and found to be very poor or fall rapidly. To address the issue of improving the performance of action recognition in dark or low-light videos; in this article, we have developed an efficient deep 3-D convolutional neural network based action recognition model. The proposed algorithm follows two-stages for action recognition. In the first stage, the low-light videos are enhanced using zero-reference deep curve estimation, followed by the min-max sampling algorithm. In the latter stage, we propose an action classification network to recognize the actions in the enhanced videos. In the proposed action classification network, we explored the capabilities of the R(2+1)D for spatio-temporal feature extraction. The model's overall generalization performance depends on how well it can capture long-range temporal structure in videos, which is essential for action recognition. So we have used a graph convolutional network on the top of R(2+1)D as our video feature encoder, which captures long-term temporal dependencies of the extracted features. Finally, a bidirectional encoder representations from transformers is adhered to classify the actions from the 3-D features extracted from the enhanced video scenes. The effectiveness of the proposed action recognition scheme is verified on ARID V1.0 and ARID V1.5 datasets. It is observed that the proposed algorithm is able to achieve 96.60% and 99.88% as Top-1 and Top-5 accuracy, respectively, on ARID V1.0 dataset. Similarly, on ARID V1.5, the proposed algorithm is able to achieve 86.93% and 99.35% as Top-1 and Top-5 accuracies, respectively. To corroborate our findings, we have compared the results obtained by the proposed scheme with those of 15 state-of-the-art action recognition techniques. © 2020 IEEE.
引用
收藏
页码:1461 / 1471
页数:10
相关论文
共 50 条
  • [1] Action recognition using global spatio-temporal features derived from sparse representations
    Somasundaram, Guruprasad
    Cherian, Anoop
    Morellas, Vassilios
    Papanikolopoulos, Nikolaos
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 123 : 1 - 13
  • [2] Human emotion recognition from videos using spatio-temporal and audio features
    Munaf Rashid
    S. A. R. Abu-Bakar
    Musa Mokji
    The Visual Computer, 2013, 29 : 1269 - 1275
  • [3] STE: Spatio-Temporal Encoder for Action Spotting in Soccer Videos
    Darwish, Abdulrahman
    El-Shabrawy, Tallal
    PROCEEDINGS OF THE 5TH ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT ANALYSIS IN SPORTS, MMSPORTS 2022, 2022, : 87 - 92
  • [4] Human emotion recognition from videos using spatio-temporal and audio features
    Rashid, Munaf
    Abu-Bakar, S. A. R.
    Mokji, Musa
    VISUAL COMPUTER, 2013, 29 (12): : 1269 - 1275
  • [5] Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos
    Duta, Ionut Cosmin
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3205 - 3214
  • [6] Action Recognition Using Discriminative Spatio-Temporal Neighborhood Features
    Cheng, Shi-Lei
    Yang, Jiang-Feng
    Ma, Zheng
    Xie, Mei
    INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND INFORMATION SECURITY (CNIS 2015), 2015, : 166 - 172
  • [7] Action recognition using spatio-temporal regularity based features
    Goodhart, Taylor
    Yan, Pingkun
    Shah, Mubarak
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 745 - 748
  • [8] Human Action Recognition Based on Selected Spatio-Temporal Features via Bidirectional LSTM
    Li, Wenhui
    Nie, Weizhi
    Su, Yuting
    IEEE ACCESS, 2018, 6 : 44211 - 44220
  • [9] Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
    Duta, Ionut C.
    Ionescu, Bogdan
    Aizawa, Kiyoharu
    Sebe, Nicu
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 365 - 378
  • [10] Spatio-Temporal Adaptive Network With Bidirectional Temporal Difference for Action Recognition
    Li, Zhilei
    Li, Jun
    Ma, Yuqing
    Wang, Rui
    Shi, Zhiping
    Ding, Yifu
    Liu, Xianglong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5174 - 5185