Mask-based blind source separation and MVDR beamforming in ASR

被引:3
|
作者
He, Renke [1 ]
Long, Yanhua [1 ]
Li, Yijie [2 ]
Liang, Jiaen [2 ]
机构
[1] Shanghai Normal Univ, Dept Elect & Informat Engn, Shanghai 200234, Peoples R China
[2] Unisound AI Technol Co Ltd, Beijing 100089, Peoples R China
基金
中国国家自然科学基金;
关键词
Cocktail party problem; MVDR; BSS; T-F masking; Speech enhancement; SPEECH SEPARATION; MIXTURES;
D O I
10.1007/s10772-019-09666-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time-frequency (T-F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T-F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.
引用
收藏
页码:133 / 140
页数:8
相关论文
共 50 条
  • [41] A Genetic Algorithm for Blind Source Separation Based on Independent Component Analysis
    Dadula, Cristina P.
    Dadios, Elmer P.
    2014 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2014,
  • [42] Underdetermined Blind Source Separation Based on Third-order Statistics
    Zou Liang
    Zhang Peng
    Chen Xun
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (11) : 3960 - 3966
  • [43] A Novel Blind Source Separation Approach Based on Invasive Weed Optimization
    Li, Zhu-cheng
    Huang, Xiang-lin
    2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 43 - 48
  • [44] PDOA BASED UNDERDETERMINED BLIND SOURCE SEPARATION USING TWO MICROPHONES
    Levi, Avram
    2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
  • [45] Flow-Based Independent Vector Analysis for Blind Source Separation
    Nugraha, Aditya Arie
    Sekiguchi, Kouhei
    Fontaine, Mathieu
    Bando, Yoshiaki
    Yoshii, Kazuyoshi
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 2173 - 2177
  • [46] Underdetermined Blind Source Separation Based on Relaxed Sparsity Condition of Sources
    Peng, Dezhong
    Xiang, Yong
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (02) : 809 - 814
  • [47] Contribution of statistical tests to sparseness-based blind source separation
    Sbai, Si Mohamed Aziz
    Aissa-El-Bey, Abdeldjalil
    Pastor, Dominique
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [48] SPARSITY AND LOW-RANK AMPLITUDE BASED BLIND SOURCE SEPARATION
    Feng, Fangchen
    Kowalski, Matthieu
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 571 - 575
  • [49] A NEW MASK-BASED OBJECTIVE MEASURE FOR PREDICTING THE INTELLIGIBILITY OF BINARY MASKED SPEECH
    Yu, Chengzhu
    Wojcicki, Kamil K.
    Loizou, P. C.
    Hansen, John H. L.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7030 - 7033
  • [50] Improve the robustness of MVDR beamforming method based on steering vector estimation and sparse constraint
    Ibrahim, K. N.
    Khalil, Elie
    2019 INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2019,