A Cross-Scale Hierarchical Transformer With Correspondence-Augmented Attention for Inferring Bird's-Eye-View Semantic Segmentation

被引:3
|
作者
Fang, Naiyu [1 ]
Qiu, Lemiao [1 ]
Zhang, Shuyou [1 ]
Wang, Zili [1 ]
Hu, Kerui [1 ]
Wang, Kang [2 ]
机构
[1] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China
关键词
Autonomous driving; bird's-eye-view semantic segmentation; cross-scale hierarchical transformer; correspondence-augmented attention;
D O I
10.1109/TITS.2023.3348795
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
As bird's-eye-view (BEV) semantic segmentation is simple-to-visualize and easy-to-handle, it has been applied in autonomous driving to provide the surrounding information to downstream tasks. Inferring BEV semantic segmentation conditioned on multi-camera-view images is a popular scheme in the community as cheap devices and real-time processing. The recent work implemented this task by learning the content and position relationship via Vision Transformer (ViT). However, its quadratic complexity confines the relationship learning only in the latent layer, leaving the scale gap to impede the representation of fine-grained objects. In view of information absorption, when representing position-related BEV features, their weighted fusion of all view feature imposes inconducive features to disturb the fusion of conducive features. To tackle these issues, we propose a novel cross-scale hierarchical Transformer with correspondence-augmented attention for semantic segmentation inference. Specifically, we devise a hierarchical framework to refine the BEV feature representation, where the last size is only half of the final segmentation. To save the computation increase caused by this hierarchical framework, we exploit the cross-scale Transformer to learn feature relationships in a reversed-aligning way, and leverage the residual connection of BEV features to facilitate information transmission between scales. We propose correspondence-augmented attention to distinguish conducive and inconducive correspondences. It is implemented in a simple yet effective way, amplifying attention scores before the Softmax operation, so that the position-view-related and the position-view-disrelated attention scores are highlighted and suppressed. Extensive experiments demonstrate that our method has state-of-the-art performance in inferring BEV semantic segmentation conditioned on multi-camera-view images.
引用
收藏
页码:7726 / 7737
页数:12
相关论文
共 39 条
  • [21] Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes
    Ye, Xin
    Gao, Lang
    Chen, Jichen
    Lei, Mingyue
    FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [22] Efficient Semantic Segmentation for Visual Bird's-Eye View Interpretation
    Saemann, Timo
    Amende, Karl
    Milz, Stefan
    Witt, Christian
    Simon, Martin
    Petzold, Johannes
    INTELLIGENT AUTONOMOUS SYSTEMS 15, IAS-15, 2019, 867 : 679 - 688
  • [23] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers
    Xu, Runsheng
    Tu, Zhengzhong
    Xiang, Hao
    Shao, Wei
    Zhou, Bolei
    Ma, Jiaqi
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 989 - 1000
  • [24] Development of Segmentation Technology for Fall Risk Areas in Small-Scale Construction Sites Based on Bird's-eye-view Images
    Jong-ho, Na
    Jae-kang, Lee
    Hyu-soung, Shin
    Il-dong, Yun
    SENSORS AND MATERIALS, 2024, 36 (09) : 4017 - 4028
  • [25] UAVSeg: Dual-Encoder Cross-Scale Attention Network for UAV Images' Semantic Segmentation
    Wang, Zhen
    You, Zhu-Hong
    Xu, Nan
    Zhang, Chuanlei
    Huang, De-Shuang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [26] BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs
    Peng, Lang
    Chen, Zhirong
    Fu, Zhangjie
    Liang, Pengpeng
    Cheng, Erkang
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5924 - 5932
  • [27] A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View
    Kim, Curie
    Kim, Ue-Hwan
    2023 20TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS, UR, 2023, : 41 - 47
  • [28] RSBEV: Multiview Collaborative Segmentation of 3-D Remote Sensing Scenes With Bird's-Eye-View Representation
    Lin, Baihong
    Zou, Zhengxia
    Shi, Zhenwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [29] Bird's-Eye View Semantic Segmentation for Autonomous Driving through the Large Kernel Attention Encoder and Bilinear-Attention Transform Module
    Li, Ke
    Wu, Xuncheng
    Zhang, Weiwei
    Yu, Wangpengfei
    WORLD ELECTRIC VEHICLE JOURNAL, 2023, 14 (09):
  • [30] DVT: Decoupled Dual-Branch View Transformation for Monocular Bird's Eye View Semantic Segmentation
    Du, Jiayuan
    Pan, Xianghui
    Shen, Mengjiao
    Su, Shuai
    Yang, Jingwei
    Liu, Chengju
    Chen, Qijun
    2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 9769 - 9776