Binaural Multichannel Blind Speaker Separation With a Causal Low-Latency and Low-Complexity Approach

被引：2

作者：

Westhausen, Nils L. ^{[1
]}

Meyer, Bernd T. ^{[2
]}

机构：

[1] Carl Von Ossietzky Univ Oldenburg, Commun Acoust, D-26129 Oldenburg, Germany

[2] Cluster Excellence Hearing4All, D-26129 Oldenburg, Germany

来源：

IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2024年 / 5卷

关键词：

Complexity theory; Low latency communication; Speech enhancement; Convolution; Signal processing algorithms; Microphones; MIMO communication; Binaural; low-latency; multi-channel; real-time; speaker-separation; SPEECH SEPARATION; EFFICIENT;

D O I：

10.1109/OJSP.2023.3343320

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this article, we introduce a causal low-latency low-complexity approach for binaural multichannel blind speaker separation in noisy reverberant conditions. The model, referred to as Group Communication Binaural Filter and Sum Network (GCBFSnet) predicts complex filters for filter-and-sum beamforming in the time-frequency domain. We apply Group Communication (GC), i.e., latent model variables are split into groups and processed with a shared sequence model with the aim of reducing the complexity of a simple model only containing one convolutional and one recurrent module. With GC we are able to reduce the size of the model by up to 83% and the complexity up to 73% compared to the model without GC, while mostly retaining performance. Even for the smallest model configuration, GCBFSnet matches the performance of a low-complexity TasNet baseline in most metrics despite the larger size and higher number of required operations of the baseline.

引用

页码：238 / 247

页数：10

共 45 条

[1] Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions
Andersen, Asger Heidemann
de Haan, Jan Mark
Tan, Zheng-Hua
Jensen, Jesper
[J]. SPEECH COMMUNICATION, 2018, 102 : 1 - 13
[2] DBNET: DOA-DRIVEN BEAMFORMING NETWORK FOR END-TO-END REVERBERANT SOUND SOURCE SEPARATION
Aroudi, Ali
Braun, Sebastian
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 211 - 215
[3] Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding
Aroudi, Ali
Doclo, Simon
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 862 - 875
[4] Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm
Bramslow, Lars
Naithani, Gaurav
Hafez, Atefeh
Barker, Tom
Pontoppidan, Niels Henrik
Virtanen, Tuomas
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 144 (01) : 172 - 185
[5] A consolidated view of loss functions for supervised deep learning-based speech enhancement
Braun, Sebastian
Tashev, Ivan
[J]. 2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 72 - 76
[6] Cosentino J, 2020, Arxiv, DOI arXiv:2005.11262
[7] The spatial unmasking of speech: Evidence for better-ear listening
Edmonds, Barrie A.
Culling, John F.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (03) : 1539 - 1545
[8] Improving Speech Intelligibility by Hearing Aid Eye-Gaze Steering: Conditions With Head Fixated in a Multitalker Environment
Favre-Felix, Antoine
Graversen, Carina
Hietkamp, Renskje K.
Dau, Torsten
Lunner, Thomas
[J]. TRENDS IN HEARING, 2018, 22
[9] TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids
Fedorov, Igor
Stamenovic, Marko
Jensen, Carl
Yang, Li-Chia
Mandell, Ari
Gan, Yiming
Mattina, Matthew
Whatmough, Paul N.
[J]. INTERSPEECH 2020, 2020, : 4054 - 4058
[10] Clarity-2021 challenges: Machine learning challenges for advancing hearing aid processing
Graetzer, Simone
Barker, Jon
Cox, Trevor J.
Akeroyd, Michael
Culling, John F.
Naylor, Graham
Porter, Eszter
Munoz, Rhoddy Viveros
[J]. INTERSPEECH 2021, 2021, : 686 - 690

← 1 2 3 4 5 →