Feature and Model Level Fusion of Pretrained CNN for Remote Sensing Scene Classification

被引:43
作者
Du, Peijun [1 ,2 ]
Li, Erzhu [3 ]
Xia, Junshi [4 ]
Samat, Alim [5 ]
Bai, Xuyu [1 ,2 ]
机构
[1] Nanjing Univ, Dept Geog Informat Sci, Key Lab Satellite Mapping Technol & Applicat, State Adm Surveying Mapping & Geoinformat China, Nanjing 210023, Jiangsu, Peoples R China
[2] Nanjing Univ, Jiangsu Ctr Collaborat Innovat Geog Informat Reso, Nanjing 210023, Jiangsu, Peoples R China
[3] Jiangsu Normal Univ, Sch Geog Geomat & Planning, Xuzhou 221116, Jiangsu, Peoples R China
[4] RIKEN, RIKEN Ctr Adv Intelligence Projec, Tokyo 1030027, Japan
[5] Chinese Acad Sci, Xinjiang Inst Ecol & Geog, State Key Lab Desert & Oasis Ecol, Urumqi 830011, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural networks (CNNs); feature fusion; multiscale improved Fisher kernel; scene classification; subspace learning; RETRIEVAL; NETWORKS;
D O I
10.1109/JSTARS.2018.2878037
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural networks (CNN) have attracted tremendous attention in the remote sensing community due to its excellent performance in different domains. Especially for remote sensing scene classification, the CNN-based methods have brought a great breakthrough. However, it is not feasible to fully design and train a new CNN model for remote sensing scene classification, as this usually requires a large number of training samples and high computational costs. To alleviate these limitations of fully training a new model, some work attempts to use the pretrained CNN models as feature extractors to build feature representation of scene images for classification and has achieved impressive results. In this scheme, how to construct feature representation of scene image via the pretrained CNN model becomes the key process. Existing studies paid a little attention to build more discriminative feature representation by exploring the potential benefits of multilayer features from a single CNN model and different feature representations from multiple CNN models. To this end, this paper presents a fusion strategy to build the feature representation of the scene images by integrating multilayer features of a single pretrained CNN model, and extends it to a framework of multiple CNN models. For these purposes, a multiscale improved Fisher kernel coding method is used to build feature representation of the scene images on convolutional layers, and a feature fusion approach based on two feature subspace learning methods [principal component analysis (PCA)/spectral regression kernel discriminant analysis and PCA/spectral regression kernel locality preserving projection] is proposed to construct final fused features for scene classification. For validation and comparison purposes, the proposed approaches are evaluated with two challenging high-resolution remote sensing datasets and shows the competitive performance compared with existing state-of-the-art baselines such as fully trained CNN models, fine tuning CNN models, and other related works.
引用
收藏
页码:2600 / 2611
页数:12
相关论文
共 47 条
[1]  
[Anonymous], P 3 INT C LEARNING R
[2]  
[Anonymous], 2003, P ADV NEUR INF PROC
[3]  
[Anonymous], 2007, PROC IEEE INT C COMP
[4]  
[Anonymous], ADV NEURAL INFORM PR
[5]  
[Anonymous], 2014, ABS14053531 CORR
[6]  
[Anonymous], 2015, PROC JOINT URBAN REM
[7]  
Baker S., 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), P83, DOI 10.1109/AFGR.2000.840616
[8]   Generalized discriminant analysis using a kernel approach [J].
Baudat, G ;
Anouar, FE .
NEURAL COMPUTATION, 2000, 12 (10) :2385-2404
[9]  
Castelluccio M, 2015, ARXIV
[10]   Exploring Fine-Grained Task-based Execution on Multi-GPU Systems [J].
Chen, Long ;
Villa, Oreste ;
Gao, Guang R. .
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, :386-394