End-to-End Paired Ambisonic-Binaural Audio Rendering

被引:1
|
作者
Zhu, Yin [1 ]
Kong, Qiuqiang [2 ,3 ]
Shi, Junjie [2 ,3 ]
Liu, Shilei [2 ,3 ]
Ye, Xuzhou [2 ,3 ]
Wang, Ju-Chiang [2 ,3 ]
Shan, Hongming [4 ,5 ,6 ]
Zhang, Junping [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[2] Beijing ByteDance Technol Co Ltd, Shanghai 201102, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Fudan Univ, Inst Sci & Technol Brain Inspired Intelligence, Shanghai 200433, Peoples R China
[5] Fudan Univ, MOE Frontiers Ctr Brain Sci, Shanghai 200433, Peoples R China
[6] Shanghai Ctr Brain Sci & Brain Inspired Technol, Shanghai 200031, Peoples R China
基金
中国国家自然科学基金;
关键词
Measurement; Costs; Neural networks; Virtual reality; Rendering (computer graphics); Task analysis; Optimization; Ambisonic; attention; binaural rendering; neural network;
D O I
10.1109/JAS.2023.123969
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function (HRTF) datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.
引用
收藏
页码:502 / 513
页数:12
相关论文
共 50 条
  • [21] IT?N: End-to-end audio generation with It? stochastic differential equations
    Shi, Ziqiang
    Wu, Shoule
    DIGITAL SIGNAL PROCESSING, 2022, 132
  • [22] An Approach to End-to-End Audio Transmission Using Laser Communication
    Audre Arlene Anthony
    Jagadeesh Basavaiah
    Chandrashekar Mohan Patil
    Wireless Personal Communications, 2021, 118 : 1439 - 1451
  • [23] End-to-End Audio-Visual Neural Speaker Diarization
    He, Mao-kui
    Du, Jun
    Lee, Chin-Hui
    INTERSPEECH 2022, 2022, : 1461 - 1465
  • [24] The end-to-end distance of RNA as a randomly self-paired polymer
    Fang, Li Tai
    JOURNAL OF THEORETICAL BIOLOGY, 2011, 280 (01) : 101 - 107
  • [25] End-to-end training of deep probabilistic CCA on paired biomedical observations
    Gundersen, Gregory
    Dumitrascu, Bianca
    Ash, Jordan T.
    Engelhardt, Barbara E.
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 945 - 955
  • [26] Towards End-to-End Embroidery Style Generation: A Paired Dataset and Benchmark
    Ye, Jingwen
    Ji, Yixin
    Song, Jie
    Feng, Zunlei
    Song, Mingli
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 201 - 213
  • [27] Directional Bias Equalisation of First-Order Binaural Ambisonic Rendering
    McKenzie, Thomas
    Murphy, Damian
    Kearney, Gavin
    2018 AES INTERNATIONAL CONFERENCE ON AUDIO FOR VIRTUAL AND AUGMENTED REALITY, 2018,
  • [28] Towards an End-to-End Visual-to-Raw-Audio Generation With GAN
    Liu, Shiguang
    Li, Sijia
    Cheng, Haonan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1299 - 1312
  • [29] A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification
    Naranjo-Alcazar, Javier
    Perez-Castanos, Sergi
    Martin-Morato, Irene
    Zuccarello, Pedro
    Ferri, Francesc J.
    Cobos, Maximo
    IEEE ACCESS, 2020, 8 : 188875 - 188882
  • [30] FROM AUDIO TO SEMANTICS: APPROACHES TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Haghani, Parisa
    Narayanan, Arun
    Bacchiani, Michiel
    Chuang, Galen
    Gaur, Neeraj
    Moreno, Pedro
    Prabhavalkar, Rohit
    Qu, Zhongdi
    Waters, Austin
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 720 - 726