LaSNet: An end-to-end network based on steering vector filter for sound source localization and separation

被引：3

作者：

Yang, Xiaokang ^{[1
]}

Zhang, Hongcheng ^{[1
]}

Lu, Yufei ^{[1
]}

A, Ying ^{[1
]}

Ren, Guangyi ^{[1
]}

Wei, Jianguo ^{[1
]}

Wang, Xianliang ^{[2
]}

Li, Wei ^{[2
]}

机构：

[1] Tianjin Univ, 92 Weijin Road, Tianjin 300072, Peoples R China

[2] NIO, Laiguangying West Rd, Beijing 100020, Peoples R China

来源：

APPLIED ACOUSTICS | 2023年 / 212卷

关键词：

Source localization and separation; Separation driving localization network; Steering vector beamforming; Localization and separation network; ACOUSTIC SOURCE LOCALIZATION; SPEECH SEPARATION; ALGORITHM;

D O I：

10.1016/j.apacoust.2023.109562

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose a novel time-domain end-to-end network (LaSNet) for solving the problem of multiple sound sources localization (SSL) and separation based on microphone array. The traditional time-frequency (T F) signal representation is subject to various prior conditions and fails to separate the different sound signal components. Even the data-driven neural network does not develop an effectively integrated approach where localization and separation interplay to serve both challenge. To address the aforementioned issue, we propose a novel approach that involves the implementation of a Separation Driving Localization Network (SDLNet). This framework operates by extracting latent features from a separation network and subsequently employing them in the context of a localization network. Then we propose a simple multi-task network for both SSL and separation. Through the analysis of steering vector filter, we find that the localization and separation problems can be linked by the operation of pseudo-inverse (pinv). To facilitate a synergistic relationship between SSL and sound separation, while also enabling end-to-end network training, we develop a Pinv Module (PM). Fianlly, the Localization and Separation Network (LaSNet) structure of this paper is proposed. Inspired by the overlay mechanism of network, LasNet is extended to a multi-task and multi-layer network, in which separation task is divided into multiple subtasks. A fuzzy separation loss function is introduced for training multi-layer network. Numerical experiments demonstrate that the proposed method has a clearly better advantageous improvement than several well known models. LaSNet has greatly performance improvement in both separation and localization, and achieves at least 32% relative reduction in model size, compared with the baseline models.

引用

页数：7

共 42 条

[1]

Ali R, 2021, EURASIP J AUDIO SPEE, V2021, DOI 10.1186/s13636-020-00192-2

[2] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[3] A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications [J].

Alzubaidi, Laith ;

Bai, Jinshuai ;

Al-Sabaawi, Aiman ;

Santamaria, Jose ;

Albahri, A. S. ;

Al-dabbagh, Bashar Sami Nayyef ;

Fadhel, Mohammed. A. A. ;

Manoufali, Mohamed ;

Zhang, Jinglan ;

Al-Timemy, Ali. H. H. ;

Duan, Ye ;

Abdullah, Amjed ;

Farhan, Laith ;

Lu, Yi ;

Gupta, Ashish ;

Albu, Felix ;

Abbosh, Amin ;

Gu, Yuantong .

JOURNAL OF BIG DATA, 2023, 10 (01)

[4] A survey on sound source localization in robotics: From binaural to array processing methods [J].

Argentieri, S. ;

Danes, P. ;

Soueres, P. .

COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01) :87-112

[5]

Benesty J, 2008, SPRINGER TOP SIGN PR, V1, P1

[6] THE FIRST MULTIMODAL INFORMATION BASED SPEECH PROCESSING (MISP) CHALLENGE: DATA, TASKS, BASELINES AND RESULTS [J].

Chen, Hang ;

Zhou, Hengshun ;

Du, Jun ;

Lee, Chin-Hui ;

Chen, Jingdong ;

Watanabe, Shinji ;

Siniscalchi, Sabato Marco ;

Scharenborg, Odette ;

Liu, Di-Yuan ;

Yin, Bao-Cai ;

Pan, Jia ;

Gao, Jian-Qing ;

Liu, Cong .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :9266-9270

[7] On the Robustness of the Superdirective Beamformer [J].

Chen, Xi ;

Benesty, Jacob ;

Huang, Gongping ;

Chen, Jingdong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :838-849

[8] AN END-TO-END DEEP LEARNING FRAMEWORK FOR MULTIPLE AUDIO SOURCE SEPARATION AND LOCALIZATION [J].

Chen, Yu ;

Liu, Bowen ;

Zhang, Zijian ;

Kim, Hun-Seok .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :736-740

[9] The LOCATA Challenge: Acoustic Source Localization and Tracking [J].

Evers, Christine ;

Loellmann, Heinrich W. ;

Mellmann, Heinrich ;

Schmidt, Alexander ;

Barfuss, Hendrik ;

Naylor, Patrick A. ;

Kellermann, Walter .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :1620-1643

[10] ALGORITHM FOR LINEARLY CONSTRAINED ADAPTIVE ARRAY PROCESSING [J].

FROST, OL .

PROCEEDINGS OF THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, 1972, 60 (08) :926-&

← 1 2 3 4 5 →