A Cross-Scale Hierarchical Transformer With Correspondence-Augmented Attention for Inferring Bird's-Eye-View Semantic Segmentation

被引:3
|
作者
Fang, Naiyu [1 ]
Qiu, Lemiao [1 ]
Zhang, Shuyou [1 ]
Wang, Zili [1 ]
Hu, Kerui [1 ]
Wang, Kang [2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China
关键词
Autonomous driving; bird's-eye-view semantic segmentation; cross-scale hierarchical transformer; correspondence-augmented attention;
D O I
10.1109/TITS.2023.3348795
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
As bird's-eye-view (BEV) semantic segmentation is simple-to-visualize and easy-to-handle, it has been applied in autonomous driving to provide the surrounding information to downstream tasks. Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing. The recent work implemented this task by learning the content and position relationship via Vision Transformer (ViT). However, its quadratic complexity confines the relationship learning only in the latent layer, leaving the scale gap to impede the representation of fine-grained objects. In view of information absorption, when representing position-related BEV features, their weighted fusion of all view feature imposes inconducive features to disturb the fusion of conducive features. To tackle these issues, we propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inference. Specifically, we devise a hierarchical framework to refine the BEV feature representation, where the last size is only half of the final segmentation. To save the computation increase caused by this hierarchical framework, we exploit the cross-scale Transformer to learn feature relationships in a reversed-aligning way, and leverage the residual connection of BEV features to facilitate information transmission between scales. We propose correspondence-augmented attention to distinguish conducive and inconducive correspondences. It is implemented in a simple yet effective way, amplifying attention scores before the Softmax operation, so that the position-view-related and the position-view-disrelated attention scores are highlighted and suppressed. Extensive experiments demonstrate that our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.
引用
收藏
页码:7726 / 7737
页数:12
相关论文
共 39 条
  • [31] BAEFormer: Bi-directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation
    Pan, Cong
    He, Yonghao
    Peng, Junran
    Zhang, Qian
    Sui, Wei
    Zhang, Zhaoxiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9590 - 9599
  • [32] SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection
    Zhang, Jinqing
    Zhang, Yanan
    Liu, Qingjie
    Wang, Yunhong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3325 - 3334
  • [33] BEVSeg2GTA: Joint Vehicle Segmentation and Graph Neural Networks for Ego Vehicle Trajectory Prediction in Bird's-Eye-View
    Sharma, Sushil
    Das, Arindam
    Sistu, Ganesh
    Halton, Mark
    Eising, Ciaran
    IEEE ACCESS, 2024, 12 : 132159 - 132174
  • [34] MCTNet: Multiscale Cross-Attention-Based Transformer Network for Semantic Segmentation of Large-Scale Point Cloud
    Guo, Bo
    Deng, Liwei
    Wang, Ruisheng
    Guo, Wenchao
    Ng, Alex Hay-Man
    Bai, Wenfeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [35] A Cross-Scale Transformer and Triple-View Attention Based Domain-Rectified Transfer Learning for EEG Classification in RSVP Tasks
    Luo, Jie
    Cui, Weigang
    Xu, Song
    Wang, Lina
    Chen, Huiling
    Li, Yang
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2024, 32 : 672 - 683
  • [36] BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation
    Li, Miaoyu
    Zhang, Yachao
    Ma, Xu
    Qu, Yanyun
    Fu, Yun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11598 - 11608
  • [37] MotionBEV: Attention-Aware Online LiDAR Moving Object Segmentation With Bird's Eye View Based Appearance and Motion Features
    Zhou B.
    Xie J.
    Pan Y.
    Wu J.
    Lu C.
    IEEE Robotics and Automation Letters, 2023, 8 (12) : 8074 - 8081
  • [38] A multi-stage model for bird's eye view prediction based on stereo-matching model and RGB-D semantic segmentation
    Rao, Zhongyu
    Cai, Yingfeng
    Wang, Hai
    Chen, Long
    Li, Yicheng
    IET INTELLIGENT TRANSPORT SYSTEMS, 2024, 18 (12) : 2552 - 2564
  • [39] Bird’s-Eye View Semantic Segmentation and Voxel Semantic Segmentation Based on Frustum Voxel Modeling and Monocular Camera; [基于锥型体素建模和单目相机的鸟瞰图语义分割和体素语义分割]
    Qin C.
    Wang Y.
    Zhang Y.
    Yin C.
    Journal of Shanghai Jiaotong University (Science), 2023, 28 (01) : 100 - 113