A Cross-Scale Hierarchical Transformer With Correspondence-Augmented Attention for Inferring Bird's-Eye-View Semantic Segmentation

被引:3
|
作者
Fang, Naiyu [1 ]
Qiu, Lemiao [1 ]
Zhang, Shuyou [1 ]
Wang, Zili [1 ]
Hu, Kerui [1 ]
Wang, Kang [2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China
关键词
Autonomous driving; bird's-eye-view semantic segmentation; cross-scale hierarchical transformer; correspondence-augmented attention;
D O I
10.1109/TITS.2023.3348795
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
As bird's-eye-view (BEV) semantic segmentation is simple-to-visualize and easy-to-handle, it has been applied in autonomous driving to provide the surrounding information to downstream tasks. Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing. The recent work implemented this task by learning the content and position relationship via Vision Transformer (ViT). However, its quadratic complexity confines the relationship learning only in the latent layer, leaving the scale gap to impede the representation of fine-grained objects. In view of information absorption, when representing position-related BEV features, their weighted fusion of all view feature imposes inconducive features to disturb the fusion of conducive features. To tackle these issues, we propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inference. Specifically, we devise a hierarchical framework to refine the BEV feature representation, where the last size is only half of the final segmentation. To save the computation increase caused by this hierarchical framework, we exploit the cross-scale Transformer to learn feature relationships in a reversed-aligning way, and leverage the residual connection of BEV features to facilitate information transmission between scales. We propose correspondence-augmented attention to distinguish conducive and inconducive correspondences. It is implemented in a simple yet effective way, amplifying attention scores before the Softmax operation, so that the position-view-related and the position-view-disrelated attention scores are highlighted and suppressed. Extensive experiments demonstrate that our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.
引用
收藏
页码:7726 / 7737
页数:12
相关论文
共 39 条
  • [1] Camera-view supervision for bird's-eye-view semantic segmentation
    Yang, Bowen
    Yu, Linlin
    Chen, Feng
    FRONTIERS IN BIG DATA, 2024, 7
  • [2] Cross-scale sampling transformer for semantic image segmentation
    Ma, Yizhe
    Yu, Long
    Lin, Fangjian
    Tian, Shengwei
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2895 - 2907
  • [3] LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation
    Bartoccioni, Florent
    Zablocki, Eloi
    Bursuc, Andrei
    Perez, Patrick
    Cord, Matthieu
    Alahari, Karteek
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1663 - 1672
  • [4] X-Align++: cross-modal cross-view alignment for Bird’s-eye-view segmentation
    Shubhankar Borse
    Marvin Klingner
    Varun Ravi
    Hong Cai
    Abdulaziz Almuzairee
    Senthil Yogamani
    Fatih Porikli
    Machine Vision and Applications, 2023, 34
  • [5] X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation
    Borse, Shubhankar
    Klingner, Marvin
    Kumar, Varun Ravi
    Cai, Hong
    Almuzairee, Abdulaziz
    Yogamani, Senthil
    Porikli, Fatih
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3286 - 3296
  • [6] Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation
    Jiang, Feng
    Gao, Heng
    Qiu, Shoumeng
    Zhang, Haiqiang
    Wan, Ru
    Pu, Jian
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 402 - 407
  • [7] Application of Dynamic Deformable Attention in Bird's-Eye-View Detection
    Gu, Weihao
    Ai, Rui
    Liu, Jinlong
    Fan, Lili
    Cao, Dongpu
    Zhang, Kai
    IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION, 2022, 6 : 886 - 890
  • [8] 3D Bird's-Eye-View Instance Segmentation
    Elich, Cathrin
    Engelmann, Francis
    Kontogianni, Theodora
    Leibe, Bastian
    PATTERN RECOGNITION, DAGM GCPR 2019, 2019, 11824 : 48 - 61
  • [9] Bird's Eye View Semantic Segmentation based on Improved Transformer for Automatic Annotation
    Liang, Tianjiao
    Pan, Weiguo
    Bao, Hong
    Fan, Xinyue
    Li, Han
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2023, 17 (08): : 1996 - 2015
  • [10] Progressive Temporal Transformer for Bird's-Eye-View Camera Pose Estimation
    Wu, Zhuoyuan
    Cai, Jiancheng
    Huang, Ranran
    Liu, Xinmin
    Chai, Zhenhua
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 133 - 147