NON-NEGATIVE MATRIX FACTORIZATION AS NOISE-ROBUST FEATURE EXTRACTOR FOR SPEECH RECOGNITION

被引：25

作者：

Schuller, Bjoern ^{[1
]}

Weninger, Felix ^{[1
]}

Woellmer, Martin ^{[1
]}

Sun, Yang ^{[1
]}

Rigoll, Gerhard ^{[1
]}

机构：

[1] Tech Univ Munich, Inst Human Machine Commun, D-80333 Munich, Germany

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

Non-Negative Matrix Factorization; Speech recognition; Noise robustness; Dynamic Bayesian Networks; Long Short-Term Memory;

D O I：

10.1109/ICASSP.2010.5495567

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We introduce a novel approach for noise-robust feature extraction in speech recognition, based on non-negative matrix factorization (NMF). While NMF has previously been used for speech denoising and speaker separation, we directly extract time-varying features from the NMF output. To this end we extend basic unsupervised NMF to a hybrid supervised/unsupervised algorithm. We present a Dynamic Bayesian Network (DBN) architecture that can exploit these features in a Tandem manner together with the maximum likelihood phoneme estimate of a bidirectional long short-term memory (BLSTM) recurrent neural network. We show that addition of NMF features to spelling recognition systems can increase word accuracy by up to 7% absolute in a noisy car environment.

引用

页码：4562 / 4565

页数：4

共 50 条

[1] NON-NEGATIVE MATRIX FACTORIZATION FOR HIGHLY NOISE-ROBUST ASR: TO ENHANCE OR TO RECOGNIZE?
Weninger, Felix
Woellmer, Martin
Geiger, Juergen
Schuller, Bjoern
Gemmeke, Jort F.
Hurmalainen, Antti
Virtanen, Tuomas
Rigoll, Gerhard
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4681 - 4684
[2] NON-NEGATIVE MATRIX DECONVOLUTION IN NOISE ROBUST SPEECH RECOGNITION
Hurmalainen, Antti
Gemmeke, Jort
Virtanen, Tuomas
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4588 - 4591
[3] Exploiting Non-negative Matrix Factorization with Linear Constraints in Noise-Robust Speaker Identification
Lyubimov, Nikolay
Nastasenko, Marina
Kotov, Mikhail
Doroshin, Danila
SPEECH AND COMPUTER, 2014, 8773 : 200 - 208
[4] Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation
Li, Yinan
Zhang, Xiongwei
Sun, Meng
ETRI JOURNAL, 2017, 39 (01) : 21 - 29
[5] Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization
Aihara, Ryo
Takashima, Ryoichi
Takiguchi, Tetsuya
Ariki, Yasuo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1411 - 1418
[6] NOISE-ROBUST VOICE CONVERSION USING A SMALL PARALLE DATA BASED ON NON-NEGATIVE MATRIX FACTORIZATION
Aihara, Ryo
Fujii, Takao
Nakashika, Toru
Takiguchi, Tetsuya
Ariki, Yasuo
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 315 - 319
[7] LEARNING SPEECH FEATURES IN THE PRESENCE OF NOISE: SPARSE CONVOLUTIVE ROBUST NON-NEGATIVE MATRIX FACTORIZATION
de Frein, Ruairi
Rickard, Scott T.
2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 1248 - 1253
[8] A supervised non-negative matrix factorization model for speech emotion recognition
Hou, Mixiao
Li, Jinxing
Lu, Guangming
SPEECH COMMUNICATION, 2020, 124 : 13 - 20
[9] SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION
Song, Peng
Ou, Shifeng
Zheng, Wenming
Jin, Yun
Zhao, Li
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5180 - 5184
[10] Feature Weighted Non-Negative Matrix Factorization
Chen, Mulin
Gong, Maoguo
Li, Xuelong
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (02) : 1093 - 1105

← 1 2 3 4 5 →