Deep feature extraction and motion representation for satellite video scene classification

被引:21
作者
Gu, Yanfeng [1 ]
Liu, Huan [1 ]
Wang, Tengfei [1 ]
Li, Shengyang [2 ,3 ]
Gao, Guoming [1 ]
机构
[1] Harbin Inst Technol, Sch Elect & Informat Engn, Harbin 150001, Peoples R China
[2] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Beijing 100094, Peoples R China
[3] Chinese Acad Sci, Key Lab Space Utilizat, Beijing 100094, Peoples R China
基金
中国国家自然科学基金;
关键词
satellite videos; classification; convolutional neural network; CNN; long and short term memory; LSTM; motion representation; RECOGNITION; NETWORKS;
D O I
10.1007/s11432-019-2784-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Satellite video scene classification (SVSC) is an advanced topic in the remote sensing field, which refers to determine the video scene categories from satellite videos. SVSC is an important and fundamental step for satellite video analysis and understanding, which provides priors for the presence of objects and dynamic events. In this paper, a two-stage framework is proposed to extract spatial features and motion features for SVSC. More specifically, the first stage is designed to extract spatial features for satellite videos. Representative frames are firstly selected based on the blur detection and spatial activity of satellite videos. Then the fine-tuned visual geometry group network (VGG-Net) is transferred to extract spatial features based on spatial content. The second stage is designed to build motion representation for satellite videos. The motion representation of moving targets in satellite videos is first built by the second temporal principal component of principal component analysis (PCA). Second, features from the first fully connected layer of VGG-Net are used as high-level spatial representation for moving targets. Third, a small network of long and short term memory (LSTM) is further designed for encoding temporal information. Two-stage features respectively characterize spatial and temporal patterns of satellite scenes, which are finally fused for SVSC. A satellite video dataset is built for video scene classification, including 7209 video segments and covering 8 scene categories. These satellite videos are from Jilin-1 satellites and Urthecast. The experimental results show the efficiency of our proposed framework for SVSC.
引用
收藏
页数:15
相关论文
共 42 条
[1]  
[Anonymous], gence
[2]  
[Anonymous], 2007, P 15 ACM INT C MULT, DOI 10.1145/1291233.1291311
[3]  
[Anonymous], 2015, PROC CVPR IEEE
[4]   Deep Feature Fusion for VHR Remote Sensing Scene Classification [J].
Chaib, Souleyman ;
Liu, Huan ;
Gu, Yanfeng ;
Yao, Hongxun .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (08) :4775-4784
[5]   Unsupervised Feature Learning for Aerial Scene Classification [J].
Cheriyadat, Anil M. .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2014, 52 (01) :439-451
[6]   The blur effect: Perception and estimation with a new no-reference perceptual blur metric [J].
Crete, Frederique ;
Dolmiere, Thierry ;
Ladret, Patricia ;
Nicolas, Marina .
HUMAN VISION AND ELECTRONIC IMAGING XII, 2007, 6492
[7]  
Derpanis KG, 2012, PROC CVPR IEEE, P1306, DOI 10.1109/CVPR.2012.6247815
[8]   Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].
Donahue, Jeff ;
Hendricks, Lisa Anne ;
Rohrbach, Marcus ;
Venugopalan, Subhashini ;
Guadarrama, Sergio ;
Saenko, Kate ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691
[9]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[10]   Key frame selection to represent a video [J].
Dufaux, F .
2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2000, :275-278