Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments

被引：0

作者：

Ke, Shanfa ^{[1
,2
]}

Wang, Zhongyuan ^{[1
,2
]}

Hu, Ruimin ^{[1
,3
]}

Wang, Xiaochen ^{[1
,3
]}

机构：

[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan 430072, Peoples R China

[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Peoples R China

[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2023年 / 55卷 / 01期

基金：

国家重点研发计划;

关键词：

Multi-speaker separation; Isolated speech segments; Deep embedding network; Attractor point; SOUND SOURCE SEPARATION;

D O I：

10.1007/s11063-022-10887-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a real multi-speaker scenario, the signal collected by the microphone contains a large number of time periods with only one speaker's speech which were called isolated speech segments. In view of this fact, this paper proposes a single-channel multi-speaker speech separation method based on the similarity between the speaker feature center and the mixture feature in the deep embedding space. In particular, the isolated speech segments extracted from the observed signal are converted to deep embedding vectors, and then a speaker feature center will be created. The similarity between this center and the deep embedding feature of mixture is constructed as a mask of the corresponding speaker, which is used to separate the speaker's speech. A residual-based deep embedding network with stacked 2-D convolutional blocks instead of bi-directional long short-term memory is proposed for faster speed and better feature extraction. In addition, an isolated speech segment extraction method based on Chimera++ has been proposed, because the previous experiments showed that Chimera++ algorithm owns good separation performance for segments from only one speaker. The evaluation results on the general datasets show that the proposed method substantially outperforms competing algorithms up to 0.94 dB in Signal-to-Distortion Ratio.

引用

页码：385 / 400

页数：16

共 50 条

[1] Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments
Shanfa Ke
Zhongyuan Wang
Ruimin Hu
Xiaochen Wang
Neural Processing Letters, 2023, 55 : 385 - 400
[2] MULTI-SPEAKERS SPEECH SEPARATION BASED ON MODIFIED ATTRACTOR POINTS ESTIMATION AND GMM CLUSTERING
Ke, Shanfa
Hu, Ruimin
Li, Gang
Wu, Tingzhao
Wang, Xiaochen
Wang, Zhongyuan
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1414 - 1419
[3] Single-channel speech separation based on modulation frequency
Gu, Lingyun
Stern, Richard M.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 25 - 28
[4] Single-channel Speech Separation based on Gaussian Process Regression
Le Dinh Nguyen
Chen, Sih-Huei
Tai, Tzu-Chiang
Wang, Jia-Ching
2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 275 - 278
[5] Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers
Prasanna Kumar M.K.
Kumaraswamy R.
International Journal of Speech Technology, 2017, 20 (01) : 109 - 125
[6] A MAP CRITERION FOR DETECTING THE NUMBER OF SPEAKERS AT FRAME LEVEL IN MODEL-BASED SINGLE-CHANNEL SPEECH SEPARATION
Mowlaee, P.
Christensen, M. G.
Tan, Z. -H.
Jensen, S. H.
2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 538 - 541
[7] CATALOG-BASED SINGLE-CHANNEL SPEECH-MUSIC SEPARATION FOR AUTOMATIC SPEECH RECOGNITION
Demir, Cemil
Cemgil, A. Taylan
Saraclar, Murat
19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 2133 - 2137
[8] SINGLE-CHANNEL SPEECH SEPARATION BASED ON ROBUST SPARSE BAYESIAN LEARNING
Wang, Zhe
Bi, Guoan
Li, Xiumei
2017 13TH IEEE INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2017, : 113 - 117
[9] Single-Channel Speech Separation Based on Deep Clustering with Local Optimization
Fu, Taotao
Yu, Ge
Guo, Lili
Wang, Yan
Liang, Ji
2017 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF SIGNAL PROCESSING (ICFSP), 2017, : 44 - 49
[10] Catalog-Based Single-Channel Speech-Music Separation
Demir, Cemil
Cemgil, A. Taylan
Saraclar, Murat
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2786 - +

← 1 2 3 4 5 →