Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs

被引:35
作者
Bai, Lubin [1 ]
Huang, Weiming [2 ]
Zhang, Xiuyuan [1 ]
Du, Shihong [1 ]
Cong, Gao [2 ]
Wang, Haoyu [1 ]
Liu, Bo [1 ]
机构
[1] Peking Univ, Inst Remote Sensing & GIS, Beijing, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
基金
中国博士后科学基金; 中国国家自然科学基金; 新加坡国家研究基金会;
关键词
Multi -modal Representation Learning; Remote sensing images; Point; -of; -interest; Urban Function; Population Density; Gross Domestic Products; Geospatial Pretraining; SENSING DATA FUSION; REMOTE;
D O I
10.1016/j.isprsjprs.2023.05.006
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socioeconomic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR.
引用
收藏
页码:193 / 208
页数:16
相关论文
共 51 条
[1]   Geography-Aware Self-Supervised Learning [J].
Ayush, Kumar ;
Uzkent, Burak ;
Meng, Chenlin ;
Tanmay, Kumar ;
Burke, Marshall ;
Lobell, David ;
Ermon, Stefano .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :10161-10170
[2]  
Baevski A, 2022, PR MACH LEARN RES
[3]   MACHINE LEARNING-BASED ECONOMIC DEVELOPMENT MAPPING FROM MULTI-SOURCE OPEN GEOSPATIAL DATA [J].
Cao, Rui ;
Tu, Wei ;
Cai, Jixuan ;
Zhao, Tianhong ;
Xiao, Jie ;
Cao, Jinzhou ;
Gao, Qili ;
Su, Hanjing .
XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION IV, 2022, 5-4 :259-266
[4]   Deep learning-based remote and social sensing data fusion for urban region function recognition [J].
Cao, Rui ;
Tu, Wei ;
Yang, Cuixin ;
Li, Qing ;
Liu, Jun ;
Zhu, Jiasong ;
Zhang, Qian ;
Li, Qingquan ;
Qiu, Guoping .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 163 :82-97
[5]  
Caron M, 2020, ADV NEUR IN, V33
[6]   A hierarchical approach for fine-grained urban villages recognition fusing remote and social sensing data [J].
Chen, Dongsheng ;
Tu, Wei ;
Cao, Rui ;
Zhang, Yatao ;
He, Biao ;
Wang, Chisheng ;
Shi, Tiezhu ;
Li, Qingquan .
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 106
[7]  
Chen Ting, 2019, PMLR
[8]   Exploring Simple Siamese Representation Learning [J].
Chen, Xinlei ;
He, Kaiming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753
[9]  
Cohen WB, 2004, BIOSCIENCE, V54, P535, DOI 10.1641/0006-3568(2004)054[0535:LRIEAO]2.0.CO
[10]  
2