Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network

被引：58

作者：

Nguyen, Thi Ngoc Tho ^{[1
]}

Gan, Woon-Seng ^{[1
]}

Ranjan, Rishabh ^{[1
]}

Jones, Douglas L. ^{[2
]}

机构：

[1] Nanyang Technol Univ, Dept Elect & Elect Engn, Singapore 639798, Singapore

[2] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61801 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2020年 / 28卷

关键词：

Direction-of-arrival estimation; Estimation; Two dimensional displays; Reverberation; Speech processing; Robustness; Convolutional neural networks; convolutional neural network; spatial pseudo-spectrum; multi-task learning; multiple sound sources; SOURCE LOCALIZATION; MULTIPLE;

D O I：

10.1109/TASLP.2020.3019646

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Many signal processing-based methods for sound source direction-of-arrival estimation produce a spatial pseudo-spectrum of which the local maxima strongly indicate the source directions. Due to different levels of noise, reverberation and different number of overlapping sources, the spatial pseudo-spectra are noisy even after smoothing. In addition, the number of sources is often unknown. As a result, selecting the peaks from these spectra is susceptible to error. Convolutional neural network has been successfully applied to many image processing problems in general and direction-of-arrival estimation in particular. In addition, deep learning-based methods for direction-of-arrival estimation show good generalization to different environments. We propose to use a 2D convolutional neural network with multi-task learning to robustly estimate the number of sources and the directions-of-arrival from short-time spatial pseudo-spectra, which have useful directional information from audio input signals. This approach reduces the tendency of the neural network to learn unwanted association between sound classes and directional information, and helps the network generalize to unseen sound classes. The simulation and experimental results show that the proposed methods outperform other directional-of-arrival estimation methods in different levels of noise and reverberation, and different number of sources.

引用

页码：2626 / 2637

页数：12

共 30 条

[11] Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge [J].

Mesaros, Annamaria ;

Heittola, Toni ;

Benetos, Emmanouil ;

Foster, Peter ;

Lagrange, Mathieu ;

Virtanen, Tuomas ;

Plumbley, Mark D. .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (02) :379-393

[12] Localization of multiple acoustic sources with small arrays using a coherence test [J].

Mohan, Satish ;

Lockwood, Michael E. ;

Kramer, Michael L. ;

Jones, Douglas L. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04) :2136-2147

[13]

Panayotov V, 2015, INT CONF ACOUST SPEE, P5206, DOI 10.1109/ICASSP.2015.7178964

[14] Real-Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array [J].

Pavlidi, Despoina ;

Griffin, Anthony ;

Puigt, Matthieu ;

Mouchtaris, Athanasios .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10) :2193-2206

[15] CRNN-Based Multiple DoA Estimation Using Acoustic Intensity Features for Ambisonics Recordings [J].

Perotin, Laureline ;

Serizel, Romain ;

Vincent, Emmanuel ;

Guerin, Alexandre .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :22-33

[16] Deep Learning for Audio Signal Processing [J].

Purwins, Hendrik ;

Li, Bo ;

Virtanen, Tuomas ;

Schlueter, Jan ;

Chang, Shuo-Yiin ;

Sainath, Tara .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) :206-219

[17]

Rafaely B, 2017, INT CONF ACOUST SPEE, P6120, DOI 10.1109/ICASSP.2017.7953332

[18]

Ranjan R, 2019, 2019 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), P212, DOI [10.1109/spin.2019.8711626, 10.1109/SPIN.2019.8711626]

[19]

Rickard S, 2002, INT CONF ACOUST SPEE, P529

[20] ESPRIT - ESTIMATION OF SIGNAL PARAMETERS VIA ROTATIONAL INVARIANCE TECHNIQUES [J].

ROY, R ;

KAILATH, T .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (07) :984-995

← 1 2 3 →