A multimodal fusion framework for urban scene understanding and functional identification using geospatial data

被引：16

作者：

Su, Chen ^{[1
,2
]}

Hu, Xinli ^{[1
,2
,3
]}

Meng, Qingyan ^{[1
,2
,3
]}

Zhang, Linlin ^{[1
,2
,3
]}

Shi, Wenxu ^{[1
,2
]}

Zhao, Maofan ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Hainan Aerosp Informat Res Inst, Key Lab Earth Observat Hainan Prov, Sanya 572029, Peoples R China

来源：

INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION | 2024年 / 127卷

关键词：

Urban scene understanding; Urban function; Multimodal data; Remote sensing; REMOTE; CLASSIFICATION;

D O I：

10.1016/j.jag.2024.103696

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Urban scene understanding and functional identification are essential for accurately characterizing the spatial structure and optimizing the city layouts during rapid urbanization. Multimodal data is important for recognizing the distribution patterns of urban functions and revealing internal details. Previous studies have focused primarily on remote sensing imagery and points of interest (POIs) data, overlooking the role of building characteristics in determining functions of urban scenes. These studies are also limited in terms of mining and fusing multimodal features. To address these challenges, this study proposes a multimodal fusion framework that integrates remote sensing imagery, POIs, and building footprints for urban scene understanding and functional mapping. The framework employs a dual-branch model that extracts visual semantic features from the remote sensing imagery and socioeconomic features from auxiliary data, such as POIs and building footprints. A branch attention module is designed to assign weights to dual-branch features. Additionally, a multiscale feature fusion module is introduced to extract and combine multiscale features through modal interaction. Experiments in Beijing and Chengdu validate the effectiveness of the proposed framework with overall accuracy of 90.04% and 92.07%, and kappa coefficient of 0.881 and 0.895, respectively. This study provides empirical evidence to support accurate urban planning and further promote urban sustainable development. The source code is at: htt ps://github.com/sssuchen/MMFF.

引用

页数：16

共 56 条

[1]

[Anonymous], 2017, Adv. Neural Inf. Process. Syst.

[2] Mapping of functional areas in Spain based on mobile phone data during different phases of the COVID-19 pandemic [J].