SuperFormer: Enhanced Multi-Speaker Speech Separation Network Combining Channel and Spatial Adaptability

被引:0
|
作者
Jiang, Yanji [1 ,2 ]
Qiu, Youli [1 ]
Shen, Xueli [1 ]
Sun, Chuan [2 ,3 ]
Liu, Haitao [2 ]
机构
[1] Liaoning Tech Univ, Sch Software, Huludao 125105, Peoples R China
[2] Tsinghua Univ, Suzhou Automot Res Inst, Suzhou 215100, Peoples R China
[3] Hong Kong Polytech Univ, Dept Civil & Environm Engn, Hung Hom, Kowloon, Hong Kong, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 15期
基金
中国国家自然科学基金;
关键词
multi-speaker separation; speech separation; transformer; speaker enhancement; adaptive network;
D O I
10.3390/app12157650
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speech separation is a hot topic in multi-speaker speech recognition. The long-term autocorrelation of speech signal sequences is an essential task for speech separation. The keys are effective intra-autocorrelation learning for the speaker's speech, modelling the local (intra-blocks) and global (intra- and inter- blocks) dependence features of the speech sequence, with the real-time separation of as few parameters as possible. In this paper, the local and global dependence features of speech sequence information are extracted by utilizing different transformer structures. A forward adaptive module of channel and space autocorrelation is proposed to give the separated model good channel adaptability (channel adaptive modeling) and space adaptability (space adaptive modeling). In addition, at the back end of the separation model, a speaker enhancement module is considered to further enhance or suppress the speech of different speakers by taking advantage of the mutual suppression characteristics of each source signal. Experiments show that the scale-invariant signal-to-noise ratio improvement (SI-SNRi) of the proposed separation network on the public corpus WSJ0-2mix achieves better separation performance compared with the baseline models. The proposed method can provide a solution for speech separation and speech recognition in multi-speaker scenarios.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
    Shanfa Ke
    Ruimin Hu
    Xiaochen Wang
    Tingzhao Wu
    Gang Li
    Zhongyuan Wang
    Multimedia Tools and Applications, 2020, 79 : 32225 - 32241
  • [2] SOURCE-AWARE CONTEXT NETWORK FOR SINGLE-CHANNEL MULTI-SPEAKER SPEECH SEPARATION
    Li, Zeng-Xi
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 681 - 685
  • [3] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
    Ke, Shanfa
    Hu, Ruimin
    Wang, Xiaochen
    Wu, Tingzhao
    Li, Gang
    Wang, Zhongyuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (43-44) : 32225 - 32241
  • [4] A unified network for multi-speaker speech recognition with multi-channel recordings
    Liu, Conggui
    Inoue, Nakamasa
    Shinoda, Koichi
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
  • [5] Multi-channel multi-speaker transformer for speech recognition
    Guo Yifan
    Tian Yao
    Suo Hongbin
    Wan Yulong
    INTERSPEECH 2023, 2023, : 4918 - 4922
  • [6] Single-speaker/multi-speaker co-channel speech classification
    Rossignol, Stephane
    Pietquini, Olivier
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
  • [7] A Multi-channel/Multi-speaker Articulatory Database in Mandarin for Speech Visualization
    Zhang, Dan
    Liu, Xianqian
    Yan, Nan
    Wang, Lan
    Zhu, Yun
    Chen, Hui
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 299 - +
  • [8] Single-Channel Multi-Speaker Separation using Deep Clustering
    Isik, Yusuf
    Le Roux, Jonathan
    Chen, Zhuo
    Watanabe, Shinji
    Hershey, John R.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549
  • [9] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
  • [10] MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
    Li, Xiyun
    Xu, Yong
    Yu, Meng
    Zhang, Shi-Xiong
    Xu, Jiaming
    Xu, Bo
    Yu, Dong
    INTERSPEECH 2021, 2021, : 1119 - 1123