Enhancing semantic scene segmentation for indoor autonomous systems using advanced attention-supported improved UNet

被引：1

作者：

Tran, Hoang N. ^{[1
]}

Nguyen, Nghi V. ^{[1
]}

Le, Nhi Q. P. ^{[1
]}

Nguyen, Nam N. N. ^{[1
]}

Le, Thu A. N. ^{[1
]}

Nguyen, Vinh D. ^{[1
]}

机构：

[1] FPT Univ, Can Tho 94000, Vietnam

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 02期

关键词：

Segmentation; UNet; Semantic segmentation; Attention mechanisms; Convolutional neural networks (CNN); EfficientNet; VISION;

D O I：

10.1007/s11760-024-03779-w

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper introduces EFFB7-UNet, an advanced semantic segmentation framework tailored for Indoor Autonomous Vision Systems (IAVSs) utilizing the U-Net architecture. The framework employs EfficientNetB4 as its encoder, significantly enhancing feature extraction. It integrates a spatial and channel Squeeze-and-Excitation (scSE) attention block, emphasizing critical areas and features to refine segmentation outcomes. Comprehensive evaluations using the NYUv2 Dataset and various augmented datasets were conducted. This study systematically compares EFFB7-UNet's performance with multiple U-Net encoders, including ResNet50, ResNet101, MobileNet V2, VGG16, VGG19, and EfficientNets B0-B6. The findings reveal that EFFB7-UNet not only surpasses these configurations in terms of accuracy but also highlights the effectiveness of the scSE attention block in achieving superior segmentation results. Without the utilization of depth information, EFFB7-UNet achieves a 12% improvement in mean Intersection over Union (mIOU). This significant enhancement demonstrates EFFB7-UNet's adaptability across various domains, implying substantial progress in enhancing the effectiveness and reliability of Intelligent Autonomous Vision Systems (IAVS) technologies.

引用

页数：11

共 35 条

[1] Computer Vision to Improve Security Surveillance through the Identification of Digital Patterns [J].