Time-Frequency Bins Selection for Direction of Arrival Estimation Based on Speech Presence Probability Learning

被引：1

作者：

Zhang, Qinzheng ^{[1
,2
]}

Wang, Haiyan ^{[1
,3
]}

Jensen, Jesper Rindom ^{[2
]}

Tao, Shuai ^{[2
]}

Christensen, Mads Graesboll ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China

[2] Aalborg Univ, Audio Anal Lab, CREATE, DK-9000 Aalborg, Denmark

[3] Shaanxi Univ Sci & Technol, Sch Elect Informat & Artificial Intelligence, Xian 710021, Peoples R China

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2024年 / 43卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Broadband DOA estimation; A posteriori speech presence probability; Out-of-label task; Deep learning; NOISE; LOCALIZATION; SYSTEM; DOA;

D O I：

10.1007/s00034-023-02586-x

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the development of deep learning techniques, the field of direction of arrival (DOA) estimation has also made significant progress. However, the accuracy of DOA estimation using end-to-end neural networks (NNs) heavily relies on the classification step of the networks, which necessitates the use of large and representative datasets. Additionally, conventional speech presence probability (SPP) estimation methods based on the ideal ratio mask (IRM) may misclassify time-frequency (T-F) bins dominated by non-speech and noise, which hinders the accurate extraction of directional information. To improve the robustness of existing DOA estimation algorithms, this paper proposes a DOA estimation method with T-F bin selection. In terms of output, instead of using IRM-based SPP, our proposed approach focuses on the a posteriori SPP, a deliberate choice aimed at circumventing potential confusion. For input optimization, we construct features that encompass spatial, temporal, and directional information concurrently, and these are coupled with a frequency bin-wise recurrent neural network (RNN) model to attain precise multi-channel SPP estimation. Subsequently, these SPP estimates are utilized to extract local information for DOA estimation. Moreover, the cascaded structure ensures that the model has the ability to complete out-of-label tasks, effectively reducing the dataset requirements by training only a subset of direction information to achieve omnidirectional DOA estimation. Besides, this contributes to the algorithm's ability to eliminate its reliance on the step size, setting it apart from other end-to-end methods. Simulation results validate that the proposed method achieves higher accuracy and lower error compared to both NN-based end-to-end approaches and traditional full-band approaches under various conditions of reverberation and signal-to-noise ratio.

引用

页码：2961 / 2981

页数：21

共 41 条

[1] A Combinatorial Deep Learning Structure for Precise Depth of Anesthesia Estimation From From EEG Signals
Afshar, Sara
Boostani, Reza
Sanei, Saeid
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (09) : 3408 - 3415
[2] Multilingual Speech Corpus in Low-Resource Eastern and Northeastern Indian Languages for Speaker and Language Identification
Basu, Joyanta
Khan, Soma
Roy, Rajib
Basu, Tapan Kumar
Majumder, Swanirbhar
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (10) : 4986 - 5013
[3] Smart Homecare Surveillance System: Behavior Identification Based on State-Transition Support Vector Machines and Sound Directivity Pattern Analysis
Chen, Bo-Wei
Chen, Chen-Yu
Wang, Jhing-Fa
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2013, 43 (06): : 1279 - 1289
[4] Underdetermined DOA estimation via multiple time-delay covariance matrices and deep residual network
Chen Ying
Wang Xiang
Huang Zhitao
[J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2021, 32 (06) : 1354 - 1363
[5] DiBiase J.H., 2000, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays
[6] Fang W., 2021, 2021 IEEE 4 INT EL E, P1
[7] A Novel Nested Circular Microphone Array and Subband Processing-Based System for Counting and DOA Estimation of Multiple Simultaneous Speakers
Firoozabadi, Ali Dehghan
Abutalebi, Hamid Reza
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2016, 35 (02) : 573 - 601
[8] Garofolo J. S., 1993, NASA STI/Recon. Tech. Rep, V93, DOI DOI 10.35111/17GK-BN40
[9] Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
Gerkmann, Timo
Hendriks, Richard C.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1383 - 1393
[10] A survey of sound source localization with deep learning methods
Grumiaux, Pierre-Amaury
Kitic, Srdan
Girin, Laurent
Guerin, Alexandre
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 152 (01) : 107 - 151

← 1 2 3 4 5 →