Text2LiDAR: Text-Guided LiDAR Point Cloud Generation via Equirectangular Transformer

被引:0
作者
Wu, Yang [1 ]
Zhang, Kaihua [4 ,5 ]
Qian, Jianjun [1 ]
Xie, Jin [2 ,3 ]
Yang, Jian [1 ]
机构
[1] Nanjing Univ Sci & Technol, PCA Lab, Nanjing, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[3] Nanjing Univ, Sch Intelligence Sci & Technol, Suzhou, Peoples R China
[4] Nanjing Univ Informat Sci & Technol, B DAT, Nanjing, Peoples R China
[5] Nanjing Univ Informat Sci & Technol, CICAEET, Nanjing, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT LVI | 2025年 / 15114卷
关键词
LiDAR data generation; self-driving; diffusion models; VISION;
D O I
10.1007/978-3-031-72992-8_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging. Achieving high-quality and controllable LiDAR data generation is urgently needed, controlling with text is a common practice, but there is little research in this field. To this end, we propose Text2LiDAR, the first efficient, diverse, and text-controllable LiDAR data generation model. Specifically, we design an equirectangular transformer architecture, utilizing the designed equirectangular attention to capture LiDAR features in a manner with data characteristics. Then, we design a control-signal embedding injector to efficiently integrate control signals through the global-to-focused attention mechanism. Additionally, we devise a frequency modulator to assist the model in recovering high-frequency details, ensuring the clarity of the generated point cloud. To foster development in the field and optimize text-controlled generation performance, we construct nuLiDARtext which offers diverse text descriptors for 34,149 LiDAR point clouds from 850 scenes. Experiments on uncontrolled and text-controlled generation in various forms on KITTI-360 and nuScenes datasets demonstrate the superiority of our approach. The project can be found at https://github.com/wuyang98/Text2LiDAR.
引用
收藏
页码:291 / 310
页数:20
相关论文
共 88 条
  • [1] Achlioptas P, 2018, PR MACH LEARN RES, V80
  • [2] Maximizing the returns of LIDAR systems in wind farms for yaw error correction applications
    Bakhshi, Roozbeh
    Sandborn, Peter
    [J]. WIND ENERGY, 2020, 23 (06) : 1408 - 1421
  • [3] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
    Behley, Jens
    Garbade, Martin
    Milioto, Andres
    Quenzel, Jan
    Behnke, Sven
    Stachniss, Cyrill
    Gall, Juergen
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
  • [4] Caccia L, 2019, IEEE INT C INT ROBOT, P5034, DOI [10.1109/IROS40897.2019.8968535, 10.1109/iros40897.2019.8968535]
  • [5] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [6] To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels
    Chai, Yuning
    Sun, Pei
    Ngiam, Jiquan
    Wang, Weiyue
    Caine, Benjamin
    Vasudevan, Vijay
    Zhang, Xiao
    Anguelov, Dragomir
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15995 - 16004
  • [7] Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
    Chen, Kevin
    Choy, Christopher B.
    Savva, Manolis
    Chang, Angel X.
    Funkhouser, Thomas
    Savarese, Silvio
    [J]. COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 100 - 116
  • [8] Chen R, 2023, Arxiv, DOI arXiv:2303.13873
  • [9] Chen ZL, 2024, Arxiv, DOI arXiv:2309.16585
  • [10] DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3020 - 3031