Text2LiDAR: Text-Guided LiDAR Point Cloud Generation via Equirectangular Transformer

被引：0

作者：

Wu, Yang ^{[1
]}

Zhang, Kaihua ^{[4
,5
]}

Qian, Jianjun ^{[1
]}

Xie, Jin ^{[2
,3
]}

Yang, Jian ^{[1
]}

机构：

[1] Nanjing Univ Sci & Technol, PCA Lab, Nanjing, Peoples R China

[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[3] Nanjing Univ, Sch Intelligence Sci & Technol, Suzhou, Peoples R China

[4] Nanjing Univ Informat Sci & Technol, B DAT, Nanjing, Peoples R China

[5] Nanjing Univ Informat Sci & Technol, CICAEET, Nanjing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LVI | 2025年 / 15114卷

关键词：

LiDAR data generation; self-driving; diffusion models; VISION;

D O I：

10.1007/978-3-031-72992-8_17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging. Achieving high-quality and controllable LiDAR data generation is urgently needed, controlling with text is a common practice, but there is little research in this field. To this end, we propose Text2LiDAR, the first efficient, diverse, and text-controllable LiDAR data generation model. Specifically, we design an equirectangular transformer architecture, utilizing the designed equirectangular attention to capture LiDAR features in a manner with data characteristics. Then, we design a control-signal embedding injector to efficiently integrate control signals through the global-to-focused attention mechanism. Additionally, we devise a frequency modulator to assist the model in recovering high-frequency details, ensuring the clarity of the generated point cloud. To foster development in the field and optimize text-controlled generation performance, we construct nuLiDARtext which offers diverse text descriptors for 34,149 LiDAR point clouds from 850 scenes. Experiments on uncontrolled and text-controlled generation in various forms on KITTI-360 and nuScenes datasets demonstrate the superiority of our approach. The project can be found at https://github.com/wuyang98/Text2LiDAR.

引用

页码：291 / 310

页数：20

共 88 条

[1] Achlioptas P, 2018, PR MACH LEARN RES, V80
[2] Maximizing the returns of LIDAR systems in wind farms for yaw error correction applications
Bakhshi, Roozbeh
Sandborn, Peter
[J]. WIND ENERGY, 2020, 23 (06) : 1408 - 1421
[3] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
Behley, Jens
Garbade, Martin
Milioto, Andres
Quenzel, Jan
Behnke, Sven
Stachniss, Cyrill
Gall, Juergen
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
[4] Caccia L, 2019, IEEE INT C INT ROBOT, P5034, DOI [10.1109/IROS40897.2019.8968535, 10.1109/iros40897.2019.8968535]
[5] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[6] To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels
Chai, Yuning
Sun, Pei
Ngiam, Jiquan
Wang, Weiyue
Caine, Benjamin
Vasudevan, Vijay
Zhang, Xiao
Anguelov, Dragomir
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15995 - 16004
[7] Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
Chen, Kevin
Choy, Christopher B.
Savva, Manolis
Chang, Angel X.
Funkhouser, Thomas
Savarese, Silvio
[J]. COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 100 - 116
[8] Chen R, 2023, Arxiv, DOI arXiv:2303.13873
[9] Chen ZL, 2024, Arxiv, DOI arXiv:2309.16585
[10] DALL-EVAL: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Cho, Jaemin
Zala, Abhay
Bansal, Mohit
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3020 - 3031

← 1 2 3 4 5 6 7 8 9 →