Video scene analysis: an overview and challenges on deep learning algorithms

被引:0
作者
Qaisar Abbas
Mostafa E. A. Ibrahim
M. Arfan Jaffar
机构
[1] Al Imam Muhamad Ibn Saud Islamic University,Department of Computer Science
[2] Benha University,Benha Faculty of Engineering
来源
Multimedia Tools and Applications | 2018年 / 77卷
关键词
Deep learning; Computer vision; Video processing; Activity classification; Scene interpretation; Video description; Video captioning;
D O I
暂无
中图分类号
学科分类号
摘要
Video scene analysis is a recent research topic due to its vital importance in many applications such as real-time vehicle activity tracking, pedestrian detection, surveillance, and robotics. Despite its popularity, the video scene analysis is still an open challenging task and require more accurate algorithms. However, the advances in deep learning algorithms for video scene analysis have been emerged in last few years for solving the problem of real-time processing. In this paper, a review of the recent developments in deep learning and video scene analysis problems is presented. In addition, this paper also briefly describes the most recent used datasets along with their limitations. Moreover, this review provides a detailed overview of the particular challenges existed in real-time video scene analysis that has been contributed towards activity recognition, scene interpretation, and video description/captioning. Finally, the paper summarizes the future trends and challenges in video scene analysis tasks and our insights are provided to inspire further research efforts.
引用
收藏
页码:20415 / 20453
页数:38
相关论文
共 215 条
[1]  
Abdulnabi AH(2015)Multi-task CNN model for attribute prediction IEEE Trans Multimedia 17 1949-1959
[2]  
Wang G(2012)Effective codebooks for human action representation and classification in unconstrained videos IEEE Trans Multimedia 14 1234-1245
[3]  
Lu J(2015)Multimodal emotional state recognition using sequence dependent deep hierarchical features J Neural Netw 72 140-151
[4]  
Jia K(2016)On-line deep learning method for action recognition. Pattern Anal Applic 19 337-354
[5]  
Ballan L(2015)Describing multimedia content using attention-based encoder-decoder networks IEEE Tran Multimedia 17 1875-1886
[6]  
Bertini M(2012)Pedestrian detection: an evaluation of the state of the art IEEE Trans Pattern Anal Mach Intell 34 743-761
[7]  
Bimbo AD(2011)Visual Attention Wiley Interdiscip Rev Cogn Sci 2 503-514
[8]  
Seidenari L(2013)Learning hierarchical features for scene labeling IEEE Trans Pattern Anal Mach Intell 35 1915-1929
[9]  
Serra G(1980)Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36 193-202
[10]  
Barros P(2016)Deep learning for visual understanding J of Neurocomput 187 27-48