Single-channel Multi-speakers Speech Separation Based on Isolated Speech Segments

被引:0
|
作者
Ke, Shanfa [1 ,2 ]
Wang, Zhongyuan [1 ,2 ]
Hu, Ruimin [1 ,3 ]
Wang, Xiaochen [1 ,3 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan 430072, Peoples R China
[3] Wuhan Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China
基金
国家重点研发计划;
关键词
Multi-speaker separation; Isolated speech segments; Deep embedding network; Attractor point; SOUND SOURCE SEPARATION;
D O I
10.1007/s11063-022-10887-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a real multi-speaker scenario, the signal collected by the microphone contains a large number of time periods with only one speaker's speech which were called isolated speech segments. In view of this fact, this paper proposes a single-channel multi-speaker speech separation method based on the similarity between the speaker feature center and the mixture feature in the deep embedding space. In particular, the isolated speech segments extracted from the observed signal are converted to deep embedding vectors, and then a speaker feature center will be created. The similarity between this center and the deep embedding feature of mixture is constructed as a mask of the corresponding speaker, which is used to separate the speaker's speech. A residual-based deep embedding network with stacked 2-D convolutional blocks instead of bi-directional long short-term memory is proposed for faster speed and better feature extraction. In addition, an isolated speech segment extraction method based on Chimera++ has been proposed, because the previous experiments showed that Chimera++ algorithm owns good separation performance for segments from only one speaker. The evaluation results on the general datasets show that the proposed method substantially outperforms competing algorithms up to 0.94 dB in Signal-to-Distortion Ratio.
引用
收藏
页码:385 / 400
页数:16
相关论文
共 50 条
  • [21] Effect of speech priors in single-channel speech-music separation for ASR
    Demir, Cemil
    Cemgil, A. Taylan
    Saraclar, Murat
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1234 - 1237
  • [22] Learning a Discriminative Dictionary for Single-Channel Speech Separation
    Bao, Guangzhao
    Xu, Yangfei
    Ye, Zhongfu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (07) : 1130 - 1138
  • [23] SINGLE-CHANNEL SPEECH SEPARATION INTEGRATING PITCH INFORMATION BASED ON A MULTI TASK LEARNING FRAMEWORK
    Li, Xiang
    Liu, Rui
    Song, Tao
    Wu, Xihong
    Chen, Jing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7279 - 7283
  • [24] Single-channel speech separation using combined EMD and speech-specific information
    Prasanna Kumar M.K.
    Kumaraswamy R.
    International Journal of Speech Technology, 2017, 20 (4) : 1037 - 1047
  • [25] An Improved Unsupervised Single-Channel Speech Separation Algorithm for Processing Speech Sensor Signals
    Jiang, Dazhi
    He, Zhihui
    Lin, Yingqing
    Chen, Yifei
    Xu, Linyan
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [26] Multi-Head Self-Attention-Based Deep Clustering for Single-Channel Speech Separation
    Jin, Yanliang
    Tang, Chenjun
    Liu, Qianhong
    Wang, Yan
    IEEE ACCESS, 2020, 8 : 100013 - 100021
  • [27] A VQ-based Single-Channel Audio Separation for Music/Speech Mixtures
    Asgari, Meysam
    Fallah, Mahdi
    Mehrizi, Elahe Abouie
    Mostafavi, Ali
    UKSIM 2009: ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION, 2009, : 223 - +
  • [28] Deep clustering-based single-channel speech separation and recent advances
    Aihara, Ryo
    Wichern, Gordon
    Le Roux, Jonathan
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (02) : 465 - 471
  • [29] Single-channel Speech Enhancement Student under Multi-channel Speech Enhancement Teacher
    Zhang, Yuzhu
    Zhang, Hui
    Zhang, Xueliang
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 372 - 377
  • [30] Deep Clustering in Complex Domain for Single-Channel Speech Separation
    Liu, Runling
    Tang, Yu
    Mang, Hongwei
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1463 - 1468