Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments

被引:0
|
作者
Ke, Shanfa [1 ,2 ]
Wang, Zhongyuan [1 ,2 ]
Hu, Ruimin [1 ,3 ]
Wang, Xiaochen [1 ,3 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Peoples R China
[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China
基金
国家重点研发计划;
关键词
Multi-speaker separation; Isolated speech segments; Deep embedding network; Attractor point; SOUND SOURCE SEPARATION;
D O I
10.1007/s11063-022-10887-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a real multi-speaker scenario, the signal collected by the microphone contains a large number of time periods with only one speaker's speech which were called isolated speech segments. In view of this fact, this paper proposes a single-channel multi-speaker speech separation method based on the similarity between the speaker feature center and the mixture feature in the deep embedding space. In particular, the isolated speech segments extracted from the observed signal are converted to deep embedding vectors, and then a speaker feature center will be created. The similarity between this center and the deep embedding feature of mixture is constructed as a mask of the corresponding speaker, which is used to separate the speaker's speech. A residual-based deep embedding network with stacked 2-D convolutional blocks instead of bi-directional long short-term memory is proposed for faster speed and better feature extraction. In addition, an isolated speech segment extraction method based on Chimera++ has been proposed, because the previous experiments showed that Chimera++ algorithm owns good separation performance for segments from only one speaker. The evaluation results on the general datasets show that the proposed method substantially outperforms competing algorithms up to 0.94 dB in Signal-to-Distortion Ratio.
引用
收藏
页码:385 / 400
页数:16
相关论文
共 50 条
  • [1] Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments
    Shanfa Ke
    Zhongyuan Wang
    Ruimin Hu
    Xiaochen Wang
    Neural Processing Letters, 2023, 55 : 385 - 400
  • [2] MULTI-SPEAKERS SPEECH SEPARATION BASED ON MODIFIED ATTRACTOR POINTS ESTIMATION AND GMM CLUSTERING
    Ke, Shanfa
    Hu, Ruimin
    Li, Gang
    Wu, Tingzhao
    Wang, Xiaochen
    Wang, Zhongyuan
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1414 - 1419
  • [3] Single-channel speech separation based on modulation frequency
    Gu, Lingyun
    Stern, Richard M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 25 - 28
  • [4] Single-channel Speech Separation based on Gaussian Process Regression
    Le Dinh Nguyen
    Chen, Sih-Huei
    Tai, Tzu-Chiang
    Wang, Jia-Ching
    2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 275 - 278
  • [5] Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers
    Prasanna Kumar M.K.
    Kumaraswamy R.
    International Journal of Speech Technology, 2017, 20 (01) : 109 - 125
  • [6] A MAP CRITERION FOR DETECTING THE NUMBER OF SPEAKERS AT FRAME LEVEL IN MODEL-BASED SINGLE-CHANNEL SPEECH SEPARATION
    Mowlaee, P.
    Christensen, M. G.
    Tan, Z. -H.
    Jensen, S. H.
    2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 538 - 541
  • [7] CATALOG-BASED SINGLE-CHANNEL SPEECH-MUSIC SEPARATION FOR AUTOMATIC SPEECH RECOGNITION
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 2133 - 2137
  • [8] SINGLE-CHANNEL SPEECH SEPARATION BASED ON ROBUST SPARSE BAYESIAN LEARNING
    Wang, Zhe
    Bi, Guoan
    Li, Xiumei
    2017 13TH IEEE INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2017, : 113 - 117
  • [9] Single-Channel Speech Separation Based on Deep Clustering with Local Optimization
    Fu, Taotao
    Yu, Ge
    Guo, Lili
    Wang, Yan
    Liang, Ji
    2017 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF SIGNAL PROCESSING (ICFSP), 2017, : 44 - 49
  • [10] Catalog-Based Single-Channel Speech-Music Separation
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2786 - +