LaSNet: An end-to-end network based on steering vector filter for sound source localization and separation

被引:3
作者
Yang, Xiaokang [1 ]
Zhang, Hongcheng [1 ]
Lu, Yufei [1 ]
A, Ying [1 ]
Ren, Guangyi [1 ]
Wei, Jianguo [1 ]
Wang, Xianliang [2 ]
Li, Wei [2 ]
机构
[1] Tianjin Univ, 92 Weijin Road, Tianjin 300072, Peoples R China
[2] NIO, Laiguangying West Rd, Beijing 100020, Peoples R China
关键词
Source localization and separation; Separation driving localization network; Steering vector beamforming; Localization and separation network; ACOUSTIC SOURCE LOCALIZATION; SPEECH SEPARATION; ALGORITHM;
D O I
10.1016/j.apacoust.2023.109562
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T F) signal representation is subject to various prior conditions and fails to separate the different sound signal components. Even the data-driven neural network does not develop an effectively integrated approach where localization and separation interplay to serve both challenge. To address the aforementioned issue, we propose a novel approach that involves the implementation of a Separation Driving Localization Network (SDLNet). This framework operates by extracting latent features from a separation network and subsequently employing them in the context of a localization network. Then we propose a simple multi-task network for both SSL and separation. Through the analysis of steering vector filter, we find that the localization and separation problems can be linked by the operation of pseudo-inverse (pinv). To facilitate a synergistic relationship between SSL and sound separation, while also enabling end-to-end network training, we develop a Pinv Module (PM). Fianlly, the Localization and Separation Network (LaSNet) structure of this paper is proposed. Inspired by the overlay mechanism of network, LasNet is extended to a multi-task and multi-layer network, in which separation task is divided into multiple subtasks. A fuzzy separation loss function is introduced for training multi-layer network. Numerical experiments demonstrate that the proposed method has a clearly better advantageous improvement than several well known models. LaSNet has greatly performance improvement in both separation and localization, and achieves at least 32% relative reduction in model size, compared with the baseline models.
引用
收藏
页数:7
相关论文
共 42 条
[1]  
Ali R, 2021, EURASIP J AUDIO SPEE, V2021, DOI 10.1186/s13636-020-00192-2
[2]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[3]   A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications [J].
Alzubaidi, Laith ;
Bai, Jinshuai ;
Al-Sabaawi, Aiman ;
Santamaria, Jose ;
Albahri, A. S. ;
Al-dabbagh, Bashar Sami Nayyef ;
Fadhel, Mohammed. A. A. ;
Manoufali, Mohamed ;
Zhang, Jinglan ;
Al-Timemy, Ali. H. H. ;
Duan, Ye ;
Abdullah, Amjed ;
Farhan, Laith ;
Lu, Yi ;
Gupta, Ashish ;
Albu, Felix ;
Abbosh, Amin ;
Gu, Yuantong .
JOURNAL OF BIG DATA, 2023, 10 (01)
[4]   A survey on sound source localization in robotics: From binaural to array processing methods [J].
Argentieri, S. ;
Danes, P. ;
Soueres, P. .
COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01) :87-112
[5]  
Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1
[6]   THE FIRST MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) CHALLENGE: DATA, TASKS, BASELINES AND RESULTS [J].
Chen, Hang ;
Zhou, Hengshun ;
Du, Jun ;
Lee, Chin-Hui ;
Chen, Jingdong ;
Watanabe, Shinji ;
Siniscalchi, Sabato Marco ;
Scharenborg, Odette ;
Liu, Di-Yuan ;
Yin, Bao-Cai ;
Pan, Jia ;
Gao, Jian-Qing ;
Liu, Cong .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :9266-9270
[7]   On the Robustness of the Superdirective Beamformer [J].
Chen, Xi ;
Benesty, Jacob ;
Huang, Gongping ;
Chen, Jingdong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :838-849
[8]   AN END-TO-END DEEP LEARNING FRAMEWORK FOR MULTIPLE AUDIO SOURCE SEPARATION AND LOCALIZATION [J].
Chen, Yu ;
Liu, Bowen ;
Zhang, Zijian ;
Kim, Hun-Seok .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :736-740
[9]   The LOCATA Challenge: Acoustic Source Localization and Tracking [J].
Evers, Christine ;
Loellmann, Heinrich W. ;
Mellmann, Heinrich ;
Schmidt, Alexander ;
Barfuss, Hendrik ;
Naylor, Patrick A. ;
Kellermann, Walter .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :1620-1643
[10]   ALGORITHM FOR LINEARLY CONSTRAINED ADAPTIVE ARRAY PROCESSING [J].
FROST, OL .
PROCEEDINGS OF THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, 1972, 60 (08) :926-&