A Cross-Scale Hierarchical Transformer With Correspondence-Augmented Attention for Inferring Bird's-Eye-View Semantic Segmentation

被引：3

作者：

Fang, Naiyu ^{[1
]}

Qiu, Lemiao ^{[1
]}

Zhang, Shuyou ^{[1
]}

Wang, Zili ^{[1
]}

Hu, Kerui ^{[1
]}

Wang, Kang ^{[2
]}

机构：

[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China

[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 07期

关键词：

Autonomous driving; bird's-eye-view semantic segmentation; cross-scale hierarchical transformer; correspondence-augmented attention;

D O I：

10.1109/TITS.2023.3348795

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

As bird's-eye-view (BEV) semantic segmentation is simple-to-visualize and easy-to-handle, it has been applied in autonomous driving to provide the surrounding information to downstream tasks. Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing. The recent work implemented this task by learning the content and position relationship via Vision Transformer (ViT). However, its quadratic complexity confines the relationship learning only in the latent layer, leaving the scale gap to impede the representation of fine-grained objects. In view of information absorption, when representing position-related BEV features, their weighted fusion of all view feature imposes inconducive features to disturb the fusion of conducive features. To tackle these issues, we propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inference. Specifically, we devise a hierarchical framework to refine the BEV feature representation, where the last size is only half of the final segmentation. To save the computation increase caused by this hierarchical framework, we exploit the cross-scale Transformer to learn feature relationships in a reversed-aligning way, and leverage the residual connection of BEV features to facilitate information transmission between scales. We propose correspondence-augmented attention to distinguish conducive and inconducive correspondences. It is implemented in a simple yet effective way, amplifying attention scores before the Softmax operation, so that the position-view-related and the position-view-disrelated attention scores are highlighted and suppressed. Extensive experiments demonstrate that our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.

引用

页码：7726 / 7737

页数：12

共 39 条

[1] Camera-view supervision for bird's-eye-view semantic segmentation
Yang, Bowen
Yu, Linlin
Chen, Feng
FRONTIERS IN BIG DATA, 2024, 7
[2] Cross-scale sampling transformer for semantic image segmentation
Ma, Yizhe
Yu, Long
Lin, Fangjian
Tian, Shengwei
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2895 - 2907
[3] LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation
Bartoccioni, Florent
Zablocki, Eloi
Bursuc, Andrei
Perez, Patrick
Cord, Matthieu
Alahari, Karteek
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1663 - 1672
[4] X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation
Shubhankar Borse
Marvin Klingner
Varun Ravi
Hong Cai
Abdulaziz Almuzairee
Senthil Yogamani
Fatih Porikli
Machine Vision and Applications, 2023, 34
[5] X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation
Borse, Shubhankar
Klingner, Marvin
Kumar, Varun Ravi
Cai, Hong
Almuzairee, Abdulaziz
Yogamani, Senthil
Porikli, Fatih
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3286 - 3296
[6] Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation
Jiang, Feng
Gao, Heng
Qiu, Shoumeng
Zhang, Haiqiang
Wan, Ru
Pu, Jian
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 402 - 407
[7] Application of Dynamic Deformable Attention in Bird's-Eye-View Detection
Gu, Weihao
Ai, Rui
Liu, Jinlong
Fan, Lili
Cao, Dongpu
Zhang, Kai
IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION, 2022, 6 : 886 - 890
[8] 3D Bird's-Eye-View Instance Segmentation
Elich, Cathrin
Engelmann, Francis
Kontogianni, Theodora
Leibe, Bastian
PATTERN RECOGNITION, DAGM GCPR 2019, 2019, 11824 : 48 - 61
[9] Bird's Eye View Semantic Segmentation based on Improved Transformer for Automatic Annotation
Liang, Tianjiao
Pan, Weiguo
Bao, Hong
Fan, Xinyue
Li, Han
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2023, 17 (08): : 1996 - 2015
[10] Progressive Temporal Transformer for Bird's-Eye-View Camera Pose Estimation
Wu, Zhuoyuan
Cai, Jiancheng
Huang, Ranran
Liu, Xinmin
Chai, Zhenhua
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 133 - 147

← 1 2 3 4 →