Automatic Estimation for Visual Quality Changes of Street Space via Street-View Images and Multimodal Large Language Models

被引:4
作者
Liang, Hao [1 ]
Zhang, Jiaxin [2 ,3 ]
Li, Yunqin [2 ,3 ]
Wang, Bowen [4 ]
Huang, Jingyong [2 ]
机构
[1] Nanjing Forestry Univ, Coll Landscape Architecture, Nanjing 210037, Peoples R China
[2] Nanchang Univ, Architecture & Design Coll, Nanchang 330031, Peoples R China
[3] Osaka Univ, Grad Sch Engn, Div Sustainable Energy & Environm Engn, Osaka 5650871, Japan
[4] Osaka Univ, Grad Sch Informat Sci & Technol, Osaka 5650871, Japan
关键词
Visualization; Task analysis; Estimation; Deep learning; Image color analysis; Data models; Context modeling; Smart cities; Large language models; Smart city; visual quality; deep learning; multimodal large language models; CLASSIFICATION; CHALLENGES;
D O I
10.1109/ACCESS.2024.3408843
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating Visual Quality of Street Space (VQoSS) is pivotal for urban design, environmental sustainability, civic engagement, etc. Recent advancements, notably in deep learning, have enabled large-scale analysis. However, traditional deep learning approaches are hampered by extensive data annotation requirements and limited adaptability across diverse VQoSS tasks. Multimodal Large Language Models (MLLMs) have recently demonstrated proficiency in various computer vision tasks, positioning them as promising tools for automated VQoSS assessment. In this paper, we pioneer the application of MLLMs to VQoSS change estimation, with our empirical findings affirming their effectiveness. In addition, we introduce Street Quality Generative Pre-trained Transformer (SQ-GPT), a model that distills knowledge from the current most powerful but inaccessible (not free) GPT-4V, requiring no human efforts. SQ-GPT approaches GPT-4V's performance and is viable for large-scale VQoSS change estimation. In a case study of Nanjing, we showcase the practicality of SQ-GPT and knowledge distillation pipeline. Our work promises to be a valuable asset for future urban studies research.
引用
收藏
页码:87713 / 87727
页数:15
相关论文
共 63 条
[51]   Parallel Vision for Long-Tail Regularization: Initial Results From IVFC Autonomous Driving Testing [J].
Wang, Jiangong ;
Wang, Xiao ;
Shen, Tianyu ;
Wang, Yutong ;
Li, Li ;
Tian, Yonglin ;
Yu, Hui ;
Chen, Long ;
Xin, Jingmin ;
Wu, Xiangbin ;
Zheng, Nanning ;
Wang, Fei-Yue .
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2022, 7 (02) :286-299
[52]   Assessing Street Space Quality Using Street View Imagery and Function-Driven Method: The Case of Xiamen, China [J].
Wang, Moyang ;
He, Yijun ;
Meng, Huan ;
Zhang, Ye ;
Zhu, Bao ;
Mango, Joseph ;
Li, Xiang .
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (05)
[53]  
Wang YQ, 2024, Arxiv, DOI arXiv:2401.06805
[54]   Assessing progress towards sustainable development goals for Chinese urban land use: A new cloud model approach [J].
Wei, Chanjuan ;
Meng, Jijun ;
Zhu, Likai ;
Han, Ziyan .
JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2023, 326
[55]  
Whyte W.H., 1980, The social life of small urban spaces
[56]   Mixed land use measurement and mapping with street view images and spatial context-aware prompts via zero-shot multimodal learning [J].
Wu, Meiliu ;
Huang, Qunying ;
Gao, Song ;
Zhang, Zhou .
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 125
[57]  
Xu ZY, 2022, Arxiv, DOI arXiv:2212.10773
[58]  
Yin SK, 2024, Arxiv, DOI [arXiv:2306.13549, 10.48550/arXiv.2306.13549]
[59]   Uncovering Bias in Objective Mapping and Subjective Perception of Urban Building Functionality: A Machine Learning Approach to Urban Spatial Perception [J].
Zhang, Jiaxin ;
Yu, Zhilin ;
Li, Yunqin ;
Wang, Xueqiang .
LAND, 2023, 12 (07)
[60]   Automatic generation of synthetic datasets from a city digital twin for use in the instance segmentation of building facades [J].
Zhang, Jiaxin ;
Fukuda, Tomohiro ;
Yabuki, Nobuyoshi .
JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING, 2022, 9 (05) :1737-1755