MiM-UNet: An efficient building image segmentation network integrating state space models

被引:0
作者
Liu, Dong [1 ,2 ,3 ]
Wang, Zhiyong [2 ,4 ]
Liang, Ankai [5 ]
机构
[1] Shandong Youth Univ Polit Sci, Sch Informat Engn, Jinan 250103, Peoples R China
[2] Shandong Prov Engn Res Ctr New Qual Prod & Data As, Jinan, Peoples R China
[3] New Technol Res & Dev Ctr Intelligent Informat Con, Jinan, Peoples R China
[4] Shandong Youth Univ Polit Sci, Sch Accountancy, Jinan 250103, Peoples R China
[5] TikTok Inc, San Jose, CA 95110 USA
关键词
Building segmentation; Complex terrain; State space models; Remote sensing images; Deep learning; U-NET ARCHITECTURE; SEMANTIC SEGMENTATION;
D O I
10.1016/j.aej.2025.02.035
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the advancement of remote sensing technology, the analysis of complex terrain images has become crucial for urban planning and geographic information extraction. However, existing models face significant challenges in processing intricate building structures: Transformer-based models suffer from high computational complexity and memory demands, while Convolutional Neural Networks (CNNs) often struggle to capture features across multiple scales and hierarchical levels. To address these limitations, we propose a novel architecture, Mamba-in-Mamba U-Net (MiM-UNet), which integrates the design principles of state-space models (SSMs) to enhance both computational efficiency and feature extraction capacity. Specifically, MiM-UNet refines the traditional encoder-decoder framework by introducing Mamba-in-Mamba blocks, enabling precise multi-scale feature capture and efficient information fusion. Experimental results demonstrate that MiM-UNet outperforms state-of-the-art models in segmentation accuracy on the Massachusetts building dataset, while substantially reducing computational overhead, highlighting its superior performance and promising potential for practical applications.
引用
收藏
页码:648 / 656
页数:9
相关论文
共 49 条
  • [11] Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture
    Ding, Lei
    Zhang, Jing
    Bruzzone, Lorenzo
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (08): : 5367 - 5376
  • [12] Gao Y., 2022, arXiv, DOI [arXiv:2203.00131, DOI 10.48550/ARXIV.2203.00131]
  • [13] UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
    Gao, Yunhe
    Zhou, Mu
    Metaxas, Dimitris N.
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III, 2021, 12903 : 61 - 71
  • [14] Guo H., 2024, J. Xi' Univ. Financ. Econ., V37, P21
  • [15] Coarse to fine-based image-point cloud fusion network for 3D object detection
    Hao, Meilan
    Zhang, Zhongkang
    Li, Lei
    Dong, Kejian
    Cheng, Long
    Tiwari, Prayag
    Ning, Xin
    [J]. INFORMATION FUSION, 2024, 112
  • [16] Machine Learning Applications of Convolutional Neural Networks and Unet Architecture to Predict and Classify Demosponge Behavior
    Harrison, Dominica
    De Leo, Fabio Cabrera
    Gallin, Warren J.
    Mir, Farin
    Marini, Simone
    Leys, Sally P.
    [J]. WATER, 2021, 13 (18)
  • [17] Hybrid first and second order attention Unet for building segmentation in remote sensing images
    He, Nanjun
    Fang, Leyuan
    Plaza, Antonio
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (04)
  • [18] Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends
    Hoeser, Thorsten
    Kuenzer, Claudia
    [J]. REMOTE SENSING, 2020, 12 (10)
  • [19] DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
    Hoyer, Lukas
    Dai, Dengxin
    Van Gool, Luc
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9914 - 9925
  • [20] MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation
    Ibtehaz, Nabil
    Rahman, M. Sohel
    [J]. NEURAL NETWORKS, 2020, 121 : 74 - 87