Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

被引：1

作者：

Chen, Ziyang ^{[1
,2
]}

Gebru, Israel D. ^{[2
]}

Richardt, Christian ^{[2
]}

Kumar, Anurag ^{[3
]}

Laney, William ^{[2
]}

Owens, Andrew ^{[1
]}

Richard, Alexander ^{[2
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Meta, Codec Avatars Lab, Pittsburgh, PA USA

[3] Meta, Real Labs Res, Menlo Pk, CA USA

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

关键词：

D O I：

10.1109/CVPR52733.2024.02067

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthesis and impulse response generation which previously relied on synthetic data. In our evaluation, we thoroughly assessed existing audio and audio-visual models against multiple criteria and proposed settings to enhance their performance on real-world data. We also conducted experiments to investigate the impact of incorporating visual data (i.e., images and depth) into neural acoustic field models. Additionally, we demonstrated the effectiveness of a simple sim2real approach, where a model is pre-trained with simulated data and fine-tuned with sparse real-world data, resulting in significant improvements in the few-shot learning approach. RAF is the first dataset to provide densely captured room acoustic data, making it an ideal resource for researchers working on audio and audiovisual neural acoustic field modeling techniques. Demos and datasets are available on our project page.

引用

页码：21886 / 21896

页数：11

共 70 条

[1]

Agisoft LLC, 2023, MET 2 0

[2]

Ahn Byeongjoo, 2023, ARXIV

[3]

Barron Jonathan T., 2023, ICCV, P2

[4] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [J].

Ben Mildenhall ;

Srinivasan, Pratul P. ;

Tancik, Matthew ;

Barron, Jonathan T. ;

Ramamoorthi, Ravi ;

Ng, Ren .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :405-421

[5] OmniPhotos: Casual 360° VR Photography [J].

Bertel, Tobias ;

Yuan, Mingze ;

Lindroos, Reuben ;

Richardt, Christian .

ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06)

[6] Matterport3D: Learning from RGB-D Data in Indoor Environments [J].

Chang, Angel ;

Dai, Angela ;

Funkhouser, Thomas ;

Halber, Maciej ;

Niessner, Matthias ;

Savva, Manolis ;

Song, Shuran ;

Zeng, Andy ;

Zhang, Yinda .

PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, :667-676

[7]

Chen C., 2023, ICASSP

[8]

Chen C., 2020, J LEATHER SCI ENG, V2, P7, DOI [DOI 10.1186/S42825-020-00020-5, 10.1186/s42825-020-00020-5]

[9] Novel-View Acoustic Synthesis [J].

Chen, Changan ;

Richard, Alexander ;

Shapovalov, Roman ;

Ithapu, Vamsi Krishna ;

Neverova, Natalia ;

Grauman, Kristen ;

Vedaldi, Andrea .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :6409-6419

[10] SoundSpaces: Audio-Visual Navigation in 3D Environments [J].

Chen, Changan ;

Jain, Unnat ;

Schissler, Carl ;

Gari, Sebastia Vicenc Amengual ;

Al-Halah, Ziad ;

Ithapu, Vamsi Krishna ;

Robinson, Philip ;

Grauman, Kristen .

COMPUTER VISION - ECCV 2020, PT VI, 2020, 12351 :17-36

← 1 2 3 4 5 6 7 →