Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding

被引：3

作者：

Chakravarthula, Praneeth ^{[1
]}

D'Souza, Jim Aldon ^{[2
]}

Tseng, Ethan ^{[1
]}

Bartusek, Joe ^{[1
]}

Heide, Felix ^{[1
,2
]}

机构：

[1] Princeton Univ, Princeton, NJ 08544 USA

[2] Algolux, Montreal, PQ, Canada

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

VISION; LOCALIZATION; TRACKING;

D O I：

10.1109/CVPR52729.2023.00101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mobile robots, including autonomous vehicles rely heavily on sensors that use electromagnetic radiation like lidars, radars and cameras for perception. While effective in most scenarios, these sensors can be unreliable in unfavorable environmental conditions, including low-light scenarios and adverse weather, and they can only detect obstacles within their direct line-of-sight. Audible sound from other road users propagates as acoustic waves that carry information even in challenging scenarios. However, their low spatial resolution and lack of directional information have made them an overlooked sensing modality. In this work, we introduce long-range acoustic beamforming of sound produced by road users in-the-wild as a complementary sensing modality to traditional electromagnetic radiation-based sensors. To validate our approach and encourage further work in the field, we also introduce the first-ever multimodal long-range acoustic beamforming dataset. We propose a neural aperture expansion method for beamforming and demonstrate its effectiveness for multimodal automotive object detection when coupled with RGB images in challenging automotive scenarios, where camera-only approaches fail or are unable to provide ultra-fast acoustic sensing sampling rates. Data and code can be found here.

引用

页码：982 / 991

页数：10

共 51 条

[1]

[Anonymous], 2004, P INTERNOISE 2004

[2]

Arandjelovic R., 2018, P EUROPEAN C COMPUTE, P435

[3] Recent progress in road and lane detection: a survey [J].

Bar Hillel, Aharon ;

Lerner, Ronen ;

Levi, Dan ;

Raz, Guy .

MACHINE VISION AND APPLICATIONS, 2014, 25 (03) :727-745

[4]

Barzelay Zohar, 2007, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P1, DOI 10.1109/CVPR.2007.383344

[5]

Bojarski Mariusz, 2016, arXiv

[6]

Brandstein Michael, 2013, Microphone arrays: signal processing techniques and applications

[7]

Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164

[8] DehazeNet: An End-to-End System for Single Image Haze Removal [J].

Cai, Bolun ;

Xu, Xiangmin ;

Jia, Kui ;

Qing, Chunmei ;

Tao, Dacheng .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (11) :5187-5198

[9]

Chen X., 2017, PROC CVPR IEEE, V1, P3, DOI [DOI 10.1109/CVPR.2017.691, 10.1109/CVPR.2017.691]

[10]

Chen Z., 2021, 5 ANN C ROB LEARN

← 1 2 3 4 5 6 →