Semi-Supervised Source Localization in Reverberant Environments With Deep Generative Modeling

被引:22
作者
Bianco, Michael J. [1 ]
Gannot, Sharon [2 ]
Fernandez-Grande, Efren [3 ]
Gerstoft, Peter [1 ]
机构
[1] Univ Calif San Diego, Marine Phys Lab, San Diego, CA 92093 USA
[2] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
[3] Tech Univ Denmark, Dept Elect Engn, DK-2800 Lyngby, Denmark
基金
欧盟地平线“2020”;
关键词
Acoustics; Location awareness; Direction-of-arrival estimation; Microphones; Data models; Task analysis; Position measurement; Source localization; semi-supervised learning; generative modeling; deep learning; ACOUSTIC SOURCE LOCALIZATION; SPEECH ENHANCEMENT; SOUND; LOCATION;
D O I
10.1109/ACCESS.2021.3087697
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Localization in reverberant environments remains an open challenge. Recently, supervised learning approaches have demonstrated very promising results in addressing reverberation. However, even with large data volumes, the number of labels available for supervised learning in such environments is usually small. We propose to address this issue with a semi-supervised learning (SSL) approach, based on deep generative modeling. Our chosen deep generative model, the variational autoencoder (VAE), is trained to generate the phase of relative transfer functions (RTFs) between microphones. In parallel, a direction of arrival (DOA) classifier network based on RTF-phase is also trained. The joint generative and discriminative model, deemed VAE-SSL, is trained using labeled and unlabeled RTF-phase sequences. In learning to generate and classify the sequences, the VAE-SSL extracts the physical causes of the RTF-phase (i.e., source location) from distracting signal characteristics such as noise and speech activity. This facilitates effective end-to-end operation of the VAE-SSL, which requires minimal preprocessing of RTF-phase. VAE-SSL is compared with two signal processing-based approaches, steered response power with phase transform (SRP-PHAT) and MUltiple SIgnal Classification (MUSIC), as well as fully supervised CNNs. The approaches are compared using data from two real acoustic environments - one of which was recently obtained at Technical University of Denmark specifically for our study. We find that VAE-SSL can outperform the conventional approaches and the CNN in label-limited scenarios. Further, the trained VAE-SSL system can generate new RTF-phase samples which capture the physics of the acoustic environment. Thus, the generative modeling in VAE-SSL provides a means of interpreting the learned representations. To the best of our knowledge, this paper presents the first approach to modeling the physics of acoustic propagation using deep generative modeling.
引用
收藏
页码:84956 / 84970
页数:15
相关论文
共 47 条
[1]   Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J].
Adavanne, Sharath ;
Politis, Archontis ;
Nikunen, Joonas ;
Virtanen, Tuomas .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :34-48
[2]  
[Anonymous], 2017, P DCASE
[3]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[4]   SEMI-SUPERVISED SOURCE LOCALIZATION WITH DEEP GENERATIVE MODELING [J].
Bianco, Michael J. ;
Gannot, Sharon ;
Gerstoft, Peter .
PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
[5]   Machine learning in acoustics: Theory and applications [J].
Bianco, Michael J. ;
Gerstoft, Peter ;
Traer, James ;
Ozanich, Emma ;
Roch, Marie A. ;
Gannot, Sharon ;
Deledalle, Charles-Alban .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (05) :3590-3628
[6]  
Bingham E, 2019, J MACH LEARN RES, V20
[7]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[8]  
Brandstein MS, 1997, INT CONF ACOUST SPEE, P375, DOI 10.1109/ICASSP.1997.599651
[9]   Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals [J].
Chakrabarty, Soumitro ;
Habets, Emanuel A. P. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :8-21
[10]  
Deleforge A, 2012, IEEE INT WORKS MACH