Mask-based blind source separation and MVDR beamforming in ASR

被引：3

作者：

He, Renke ^{[1
]}

Long, Yanhua ^{[1
]}

Li, Yijie ^{[2
]}

Liang, Jiaen ^{[2
]}

机构：

[1] Shanghai Normal Univ, Dept Elect & Informat Engn, Shanghai 200234, Peoples R China

[2] Unisound AI Technol Co Ltd, Beijing 100089, Peoples R China

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2020年 / 23卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Cocktail party problem; MVDR; BSS; T-F masking; Speech enhancement; SPEECH SEPARATION; MIXTURES;

D O I：

10.1007/s10772-019-09666-x

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time-frequency (T-F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T-F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.

引用

页码：133 / 140

页数：8

共 50 条

[41] A Genetic Algorithm for Blind Source Separation Based on Independent Component Analysis
Dadula, Cristina P.
Dadios, Elmer P.
2014 INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2014,
[42] Underdetermined Blind Source Separation Based on Third-order Statistics
Zou Liang
Zhang Peng
Chen Xun
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (11) : 3960 - 3966
[43] A Novel Blind Source Separation Approach Based on Invasive Weed Optimization
Li, Zhu-cheng
Huang, Xiang-lin
2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 43 - 48
[44] PDOA BASED UNDERDETERMINED BLIND SOURCE SEPARATION USING TWO MICROPHONES
Levi, Avram
2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,
[45] Flow-Based Independent Vector Analysis for Blind Source Separation
Nugraha, Aditya Arie
Sekiguchi, Kouhei
Fontaine, Mathieu
Bando, Yoshiaki
Yoshii, Kazuyoshi
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 2173 - 2177
[46] Underdetermined Blind Source Separation Based on Relaxed Sparsity Condition of Sources
Peng, Dezhong
Xiang, Yong
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (02) : 809 - 814
[47] Contribution of statistical tests to sparseness-based blind source separation
Sbai, Si Mohamed Aziz
Aissa-El-Bey, Abdeldjalil
Pastor, Dominique
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
[48] SPARSITY AND LOW-RANK AMPLITUDE BASED BLIND SOURCE SEPARATION
Feng, Fangchen
Kowalski, Matthieu
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 571 - 575
[49] A NEW MASK-BASED OBJECTIVE MEASURE FOR PREDICTING THE INTELLIGIBILITY OF BINARY MASKED SPEECH
Yu, Chengzhu
Wojcicki, Kamil K.
Loizou, P. C.
Hansen, John H. L.
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7030 - 7033
[50] Improve the robustness of MVDR beamforming method based on steering vector estimation and sparse constraint
Ibrahim, K. N.
Khalil, Elie
2019 INTERNATIONAL SYMPOSIUM ON ADVANCED ELECTRICAL AND COMMUNICATION TECHNOLOGIES (ISAECT), 2019,

← 1 2 3 4 5 →