Learning Deep Cross-Modal Embedding Networks for Zero-Shot Remote Sensing Image Scene Classification

被引：74

作者：

Li, Yansheng ^{[1
]}

Zhu, Zhihui ^{[2
]}

Yu, Jin-Gang ^{[3
]}

Zhang, Yongjun ^{[1
]}

机构：

[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China

[2] Univ Denver, Dept Elect & Comp Engn, Denver, CO 80208 USA

[3] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2021年 / 59卷 / 12期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Visualization; Semantics; Deep learning; Task analysis; Remote sensing; Feature extraction; Big Data; Latent space; locality-preservation deep cross-modal embedding networks (LPDCMENs); remote sensing (RS) imagery; transcendental knowledge; zero-shot RS scene classification (ZSRSSC); BUILT-UP AREAS; RECOGNITION; EXTRACTION;

D O I：

10.1109/TGRS.2020.3047447

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Due to its wide applications, remote sensing (RS) image scene classification has attracted increasing research interest. When each category has a sufficient number of labeled samples, RS image scene classification can be well addressed by deep learning. However, in the RS big data era, it is extremely difficult or even impossible to annotate RS scene samples for all the categories in one time as the RS scene classification often needs to be extended along with the emergence of new applications that inevitably involve a new class of RS images. Hence, the RS big data era fairly requires a zeroshot RS scene classification (ZSRSSC) paradigm in which the classification model learned from training RS scene categories obeys the inference ability to recognize the RS image scenes from unseen categories, in common with the humans' evolutionary perception ability. Unfortunately, zero-shot classification is largely unexploited in the RS field. This article proposes a novel ZSRSSC method based on locality-preservation deep cross-modal embedding networks (LPDCMENs). The proposed LPDCMENs, which can fully assimilate the pairwise intramodal and intermodal supervision in an end-to-end manner, aim to alleviate the problem of class structure inconsistency between two hybrid spaces (i.e., the visual image space and the semantic space). To pursue a stable and generalization ability, which is highly desired for ZSRSSC, a set of explainable constraints is specially designed to optimize LPDCMENs. To fully verify the effectiveness of the proposed LPDCMENs, we collect a new large-scale RS scene data set, including the instance-level visual images and class-level semantic representations (RSSDIVCS), where the general and domain knowledge is exploited to construct the class-level semantic representations. Extensive experiments show that the proposed ZSRSSC method based on LPDCMENs can obviously outperform the state-of-the-art methods, and the domain knowledge further improves the performance of ZSRSSC compared with the general knowledge. The collected RSSDIVCS will be made publicly available along with this article.

引用

页码：10590 / 10603

页数：14

共 54 条

[1]

Bojanowski P., 2017, Trans. Assoc. Comput. Linguist, V5, P135, DOI [DOI 10.1162/TACL_A_00051, 10.1162/tacla00051]

[2] The devil is in the details: an evaluation of recent feature encoding methods [J].

Chatfield, Ken ;

Lempitsky, Victor ;

Vedaldi, Andrea ;

Zisserman, Andrew .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,

[3] Remote Sensing Image Scene Classification: Benchmark and State of the Art [J].

Cheng, Gong ;

Han, Junwei ;

Lu, Xiaoqiang .

PROCEEDINGS OF THE IEEE, 2017, 105 (10) :1865-1883

[4]

Cheng G, 2016, INT GEOSCI REMOTE SE, P767, DOI 10.1109/IGARSS.2016.7729193

[5] Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images [J].

Cheng, Gong ;

Han, Junwei ;

Guo, Lei ;

Liu, Zhenbao ;

Bu, Shuhui ;

Ren, Jinchang .

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2015, 53 (08) :4238-4249

[6] Big Data for Remote Sensing: Challenges and Opportunities [J].

Chi, Mingmin ;

Plaza, Antonio ;

Benediktsson, Jon Atli ;

Sun, Zhongyi ;

Shen, Jinsheng ;

Zhu, Yangyong .

PROCEEDINGS OF THE IEEE, 2016, 104 (11) :2207-2219

[7]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[8] Creativity Inspired Zero-Shot Learning [J].

Elhoseiny, Mohamed ;

Elfeki, Mohamed .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :5783-5792

[9] Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks [J].

Gong, Maoguo ;

Zhao, Jiaojiao ;

Liu, Jia ;

Miao, Qiguang ;

Jiao, Licheng .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (01) :125-138

[10] The edge-preservation multi-classifier relearning framework for the classification of high-resolution remotely sensed imagery [J].

Han, Xiaopeng ;

Huang, Xin ;

Li, Jiayi ;

Li, Yansheng ;

Yang, Michael Ying ;

Gong, Jianya .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 138 :57-73

← 1 2 3 4 5 6 →