End-to-End Paired Ambisonic-Binaural Audio Rendering

被引:1
|
作者
Zhu, Yin [1 ]
Kong, Qiuqiang [2 ,3 ]
Shi, Junjie [2 ,3 ]
Liu, Shilei [2 ,3 ]
Ye, Xuzhou [2 ,3 ]
Wang, Ju-Chiang [2 ,3 ]
Shan, Hongming [4 ,5 ,6 ]
Zhang, Junping [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[2] Beijing ByteDance Technol Co Ltd, Shanghai 201102, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Fudan Univ, Inst Sci & Technol Brain Inspired Intelligence, Shanghai 200433, Peoples R China
[5] Fudan Univ, MOE Frontiers Ctr Brain Sci, Shanghai 200433, Peoples R China
[6] Shanghai Ctr Brain Sci & Brain Inspired Technol, Shanghai 200031, Peoples R China
基金
中国国家自然科学基金;
关键词
Measurement; Costs; Neural networks; Virtual reality; Rendering (computer graphics); Task analysis; Optimization; Ambisonic; attention; binaural rendering; neural network;
D O I
10.1109/JAS.2023.123969
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function (HRTF) datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.
引用
收藏
页码:502 / 513
页数:12
相关论文
共 50 条
  • [1] End-to-End Paired Ambisonic-Binaural Audio Rendering
    Yin Zhu
    Qiuqiang Kong
    Junjie Shi
    Shilei Liu
    Xuzhou Ye
    Ju-Chiang Wang
    Hongming Shan
    Junping Zhang
    IEEE/CAAJournalofAutomaticaSinica, 2024, 11 (02) : 502 - 513
  • [2] End-to-End Magnitude Least Squares Binaural Rendering of Spherical Microphone Array Signals
    Deppisch, Thomas
    Helmholz, Hannes
    Ahrens, Jens
    2021 IMMERSIVE AND 3D AUDIO: FROM ARCHITECTURE TO AUTOMOTIVE (I3DA), 2021,
  • [3] End-to-End Binaural Speech Synthesis
    Huang, Wen-Chin
    Markovic, Dejan
    Gebru, Israel D.
    Menon, Anjali
    Richard, Alexander
    INTERSPEECH 2022, 2022, : 1218 - 1222
  • [4] End-to-End Compressed Meshlet Rendering
    Mlakar, D.
    Steinberger, M.
    Schmalstieg, D.
    COMPUTER GRAPHICS FORUM, 2024, 43 (01)
  • [5] END-TO-END LEARNING FOR MUSIC AUDIO
    Dieleman, Sander
    Schrauwen, Benjamin
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Conditional End-to-End Audio Transforms
    Haque, Albert
    Guo, Michelle
    Verma, Prateek
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2295 - 2299
  • [7] An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment
    Shivam Saini
    Isaac Engel
    Jürgen Peissig
    EURASIP Journal on Audio, Speech, and Music Processing, 2024
  • [8] An end-to-end approach for blindly rendering a virtual sound source in an audio augmented reality environment
    Saini, Shivam
    Engel, Isaac
    Peissig, Juergen
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [9] SoundStream: An End-to-End Neural Audio Codec
    Zeghidour, Neil
    Luebs, Alejandro
    Omran, Ahmed
    Skoglund, Jan
    Tagliasacchi, Marco
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 495 - 507
  • [10] END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM
    Vecchiotti, Paolo
    Ma, Ning
    Squartini, Stefano
    Brown, Guy J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 451 - 455