Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments

被引：0

作者：

Ke, Shanfa ^{[1
,2
]}

Wang, Zhongyuan ^{[1
,2
]}

Hu, Ruimin ^{[1
,3
]}

Wang, Xiaochen ^{[1
,3
]}

机构：

[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan 430072, Peoples R China

[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Peoples R China

[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2023年 / 55卷 / 01期

基金：

国家重点研发计划;

关键词：

Multi-speaker separation; Isolated speech segments; Deep embedding network; Attractor point; SOUND SOURCE SEPARATION;

D O I：

10.1007/s11063-022-10887-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a real multi-speaker scenario, the signal collected by the microphone contains a large number of time periods with only one speaker's speech which were called isolated speech segments. In view of this fact, this paper proposes a single-channel multi-speaker speech separation method based on the similarity between the speaker feature center and the mixture feature in the deep embedding space. In particular, the isolated speech segments extracted from the observed signal are converted to deep embedding vectors, and then a speaker feature center will be created. The similarity between this center and the deep embedding feature of mixture is constructed as a mask of the corresponding speaker, which is used to separate the speaker's speech. A residual-based deep embedding network with stacked 2-D convolutional blocks instead of bi-directional long short-term memory is proposed for faster speed and better feature extraction. In addition, an isolated speech segment extraction method based on Chimera++ has been proposed, because the previous experiments showed that Chimera++ algorithm owns good separation performance for segments from only one speaker. The evaluation results on the general datasets show that the proposed method substantially outperforms competing algorithms up to 0.94 dB in Signal-to-Distortion Ratio.

引用

页码：385 / 400

页数：16

共 50 条

[21] Effect of speech priors in single-channel speech-music separation for ASR
Demir, Cemil
Cemgil, A. Taylan
Saraclar, Murat
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1234 - 1237
[22] Learning a Discriminative Dictionary for Single-Channel Speech Separation
Bao, Guangzhao
Xu, Yangfei
Ye, Zhongfu
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) : 1130 - 1138
[23] SINGLE-CHANNEL SPEECH SEPARATION INTEGRATING PITCH INFORMATION BASED ON A MULTI TASK LEARNING FRAMEWORK
Li, Xiang
Liu, Rui
Song, Tao
Wu, Xihong
Chen, Jing
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7279 - 7283
[24] Single-channel speech separation using combined EMD and speech-specific information
Prasanna Kumar M.K.
Kumaraswamy R.
International Journal of Speech Technology, 2017, 20 (4) : 1037 - 1047
[25] An Improved Unsupervised Single-Channel Speech Separation Algorithm for Processing Speech Sensor Signals
Jiang, Dazhi
He, Zhihui
Lin, Yingqing
Chen, Yifei
Xu, Linyan
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
[26] Multi-Head Self-Attention-Based Deep Clustering for Single-Channel Speech Separation
Jin, Yanliang
Tang, Chenjun
Liu, Qianhong
Wang, Yan
IEEE ACCESS, 2020, 8 : 100013 - 100021
[27] A VQ-based Single-Channel Audio Separation for Music/Speech Mixtures
Asgari, Meysam
Fallah, Mahdi
Mehrizi, Elahe Abouie
Mostafavi, Ali
UKSIM 2009: ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION, 2009, : 223 - +
[28] Deep clustering-based single-channel speech separation and recent advances
Aihara, Ryo
Wichern, Gordon
Le Roux, Jonathan
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (02) : 465 - 471
[29] Single-channel Speech Enhancement Student under Multi-channel Speech Enhancement Teacher
Zhang, Yuzhu
Zhang, Hui
Zhang, Xueliang
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 372 - 377
[30] Deep Clustering in Complex Domain for Single-Channel Speech Separation
Liu, Runling
Tang, Yu
Mang, Hongwei
2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1463 - 1468

← 1 2 3 4 5 →