A Cross-Scale Hierarchical Transformer With Correspondence-Augmented Attention for Inferring Bird's-Eye-View Semantic Segmentation

被引：3

作者：

Fang, Naiyu ^{[1
]}

Qiu, Lemiao ^{[1
]}

Zhang, Shuyou ^{[1
]}

Wang, Zili ^{[1
]}

Hu, Kerui ^{[1
]}

Wang, Kang ^{[2
]}

机构：

[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China

[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 07期

关键词：

Autonomous driving; bird's-eye-view semantic segmentation; cross-scale hierarchical transformer; correspondence-augmented attention;

D O I：

10.1109/TITS.2023.3348795

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

As bird's-eye-view (BEV) semantic segmentation is simple-to-visualize and easy-to-handle, it has been applied in autonomous driving to provide the surrounding information to downstream tasks. Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing. The recent work implemented this task by learning the content and position relationship via Vision Transformer (ViT). However, its quadratic complexity confines the relationship learning only in the latent layer, leaving the scale gap to impede the representation of fine-grained objects. In view of information absorption, when representing position-related BEV features, their weighted fusion of all view feature imposes inconducive features to disturb the fusion of conducive features. To tackle these issues, we propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inference. Specifically, we devise a hierarchical framework to refine the BEV feature representation, where the last size is only half of the final segmentation. To save the computation increase caused by this hierarchical framework, we exploit the cross-scale Transformer to learn feature relationships in a reversed-aligning way, and leverage the residual connection of BEV features to facilitate information transmission between scales. We propose correspondence-augmented attention to distinguish conducive and inconducive correspondences. It is implemented in a simple yet effective way, amplifying attention scores before the Softmax operation, so that the position-view-related and the position-view-disrelated attention scores are highlighted and suppressed. Extensive experiments demonstrate that our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.

引用

页码：7726 / 7737

页数：12

共 39 条

[21] Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes
Ye, Xin
Gao, Lang
Chen, Jichen
Lei, Mingyue
FRONTIERS IN NEUROROBOTICS, 2023, 17
[22] Efficient Semantic Segmentation for Visual Bird's-Eye View Interpretation
Saemann, Timo
Amende, Karl
Milz, Stefan
Witt, Christian
Simon, Martin
Petzold, Johannes
INTELLIGENT AUTONOMOUS SYSTEMS 15, IAS-15, 2019, 867 : 679 - 688
[23] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
Xu, Runsheng
Tu, Zhengzhong
Xiang, Hao
Shao, Wei
Zhou, Bolei
Ma, Jiaqi
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 989 - 1000
[24] Development of Segmentation Technology for Fall Risk Areas in Small-Scale Construction Sites Based on Bird's-eye-view Images
Jong-ho, Na
Jae-kang, Lee
Hyu-soung, Shin
Il-dong, Yun
SENSORS AND MATERIALS, 2024, 36 (09) : 4017 - 4028
[25] UAVSeg: Dual-Encoder Cross-Scale Attention Network for UAV Images' Semantic Segmentation
Wang, Zhen
You, Zhu-Hong
Xu, Nan
Zhang, Chuanlei
Huang, De-Shuang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[26] BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs
Peng, Lang
Chen, Zhirong
Fu, Zhangjie
Liang, Pengpeng
Cheng, Erkang
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5924 - 5932
[27] A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View
Kim, Curie
Kim, Ue-Hwan
2023 20TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR, 2023, : 41 - 47
[28] RSBEV: Multiview Collaborative Segmentation of 3-D Remote Sensing Scenes With Bird's-Eye-View Representation
Lin, Baihong
Zou, Zhengxia
Shi, Zhenwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[29] Bird's-Eye View Semantic Segmentation for Autonomous Driving through the Large Kernel Attention Encoder and Bilinear-Attention Transform Module
Li, Ke
Wu, Xuncheng
Zhang, Weiwei
Yu, Wangpengfei
WORLD ELECTRIC VEHICLE JOURNAL, 2023, 14 (09):
[30] DVT: Decoupled Dual-Branch View Transformation for Monocular Bird's Eye View Semantic Segmentation
Du, Jiayuan
Pan, Xianghui
Shen, Mengjiao
Su, Shuai
Yang, Jingwei
Liu, Chengju
Chen, Qijun
2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 9769 - 9776

← 1 2 3 4 →