A Hybrid Approach Based on GAN and CNN-LSTM for Aerial Activity Recognition

被引:14
作者
Bousmina, Abir [1 ]
Selmi, Mouna [1 ]
Ben Rhaiem, Mohamed Amine [1 ,2 ]
Farah, Imed Riadh [1 ]
机构
[1] Univ Manouba, Natl Sch Comp Sci, RIADI Lab, Manouba 2010, Tunisia
[2] Natl Mapping & Remote Sensing Ctr, CNCT, Tunis 2045, Tunisia
关键词
UAVs; human action recognition; deep learning; CNN-LSTM; data augmentation; WGAN-GP;
D O I
10.3390/rs15143626
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Unmanned aerial vehicles (UAVs), known as drones, have played a significant role in recent years in creating resilient smart cities. UAVs can be used for a wide range of applications, including emergency response, civil protection, search and rescue, and surveillance, thanks to their high mobility and reasonable price. Automatic recognition of human activity in aerial videos captured by drones is critical for various tasks for these applications. However, this is difficult due to many factors specific to aerial views, including camera motion, vibration, low resolution, background clutter, lighting conditions, and variations in view. Although deep learning approaches have demonstrated their effectiveness in a variety of challenging vision tasks, they require either a large number of labelled aerial videos for training or a dataset with balanced classes, both of which can be difficult to obtain. To address these challenges, a hybrid data augmentation method is proposed which combines data transformation with the Wasserstein Generative Adversarial Network (GAN)-based feature augmentation method. In particular, we apply the basic transformation methods to increase the amount of video in the database. A Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) model is used to learn the spatio-temporal dynamics of actions, then a GAN-based technique is applied to generate synthetic CNN-LSTM features conditioned on action classes which provide a high discriminative spatio-temporal features. We tested our model on the YouTube aerial database, demonstrating encouraging results that surpass those of previous state-of-the-art works, including an accuracy rate of 97.83%.
引用
收藏
页数:20
相关论文
共 63 条
[1]   Deep Video-Based Performance Cloning [J].
Aberman, K. ;
Shi, M. ;
Liao, J. ;
Liscbinski, D. ;
Chen, B. ;
Cohen-Or, D. .
COMPUTER GRAPHICS FORUM, 2019, 38 (02) :219-233
[2]  
Ahsan Unaiza., 2018, arXiv
[3]   Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey [J].
Akhtar, Naveed ;
Mian, Ajmal .
IEEE ACCESS, 2018, 6 :14410-14430
[4]   Review of deep learning: concepts, CNN architectures, challenges, applications, future directions [J].
Alzubaidi, Laith ;
Zhang, Jinglan ;
Humaidi, Amjad J. ;
Al-Dujaili, Ayad ;
Duan, Ye ;
Al-Shamma, Omran ;
Santamaria, J. ;
Fadhel, Mohammed A. ;
Al-Amidie, Muthana ;
Farhan, Laith .
JOURNAL OF BIG DATA, 2021, 8 (01)
[5]  
[Anonymous], 2005, P 2005 IEEE INT WORK
[6]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267
[7]   A pseudo-softmax function for hardware-based high speed image classification [J].
Cardarilli, Gian Carlo ;
Di Nunzio, Luca ;
Fazzolari, Rocco ;
Giardino, Daniele ;
Nannarelli, Alberto ;
Re, Marco ;
Spano, Sergio .
SCIENTIFIC REPORTS, 2021, 11 (01)
[8]   Survey on Videos Data Augmentation for Deep Learning Models [J].
Cauli, Nino ;
Recupero, Diego Reforgiato .
FUTURE INTERNET, 2022, 14 (03)
[9]   Applications of drone in disaster management: A scoping review [J].
Daud, Sharifah Mastura Syed Mohd ;
Yusof, Mohd Yusmiaidil Putera Mohd ;
Heo, Chong Chin ;
Khoo, Lay See ;
Singh, Mansharan Kaur Chainchel ;
Mahmood, Mohd Shah ;
Nawawi, Hapizah .
SCIENCE & JUSTICE, 2022, 62 (01) :30-42
[10]   Feature Re-Learning with Data Augmentation for Video Relevance Prediction [J].
Dong, Jianfeng ;
Wang, Xun ;
Zhang, Leimin ;
Xu, Chaoxi ;
Yang, Gang ;
Li, Xirong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (05) :1946-1959