NON-NEGATIVE MATRIX FACTORIZATION AS NOISE-ROBUST FEATURE EXTRACTOR FOR SPEECH RECOGNITION

被引:25
|
作者
Schuller, Bjoern [1 ]
Weninger, Felix [1 ]
Woellmer, Martin [1 ]
Sun, Yang [1 ]
Rigoll, Gerhard [1 ]
机构
[1] Tech Univ Munich, Inst Human Machine Commun, D-80333 Munich, Germany
来源
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年
关键词
Non-Negative Matrix Factorization; Speech recognition; Noise robustness; Dynamic Bayesian Networks; Long Short-Term Memory;
D O I
10.1109/ICASSP.2010.5495567
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce a novel approach for noise-robust feature extraction in speech recognition, based on non-negative matrix factorization (NMF). While NMF has previously been used for speech denoising and speaker separation, we directly extract time-varying features from the NMF output. To this end we extend basic unsupervised NMF to a hybrid supervised/unsupervised algorithm. We present a Dynamic Bayesian Network (DBN) architecture that can exploit these features in a Tandem manner together with the maximum likelihood phoneme estimate of a bidirectional long short-term memory (BLSTM) recurrent neural network. We show that addition of NMF features to spelling recognition systems can increase word accuracy by up to 7% absolute in a noisy car environment.
引用
收藏
页码:4562 / 4565
页数:4
相关论文
共 50 条
  • [1] NON-NEGATIVE MATRIX FACTORIZATION FOR HIGHLY NOISE-ROBUST ASR: TO ENHANCE OR TO RECOGNIZE?
    Weninger, Felix
    Woellmer, Martin
    Geiger, Juergen
    Schuller, Bjoern
    Gemmeke, Jort F.
    Hurmalainen, Antti
    Virtanen, Tuomas
    Rigoll, Gerhard
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4681 - 4684
  • [2] NON-NEGATIVE MATRIX DECONVOLUTION IN NOISE ROBUST SPEECH RECOGNITION
    Hurmalainen, Antti
    Gemmeke, Jort
    Virtanen, Tuomas
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4588 - 4591
  • [3] Exploiting Non-negative Matrix Factorization with Linear Constraints in Noise-Robust Speaker Identification
    Lyubimov, Nikolay
    Nastasenko, Marina
    Kotov, Mikhail
    Doroshin, Danila
    SPEECH AND COMPUTER, 2014, 8773 : 200 - 208
  • [4] Robust Non-negative Matrix Factorization with β-Divergence for Speech Separation
    Li, Yinan
    Zhang, Xiongwei
    Sun, Meng
    ETRI JOURNAL, 2017, 39 (01) : 21 - 29
  • [5] Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization
    Aihara, Ryo
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1411 - 1418
  • [6] NOISE-ROBUST VOICE CONVERSION USING A SMALL PARALLE DATA BASED ON NON-NEGATIVE MATRIX FACTORIZATION
    Aihara, Ryo
    Fujii, Takao
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 315 - 319
  • [7] LEARNING SPEECH FEATURES IN THE PRESENCE OF NOISE: SPARSE CONVOLUTIVE ROBUST NON-NEGATIVE MATRIX FACTORIZATION
    de Frein, Ruairi
    Rickard, Scott T.
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 1248 - 1253
  • [8] A supervised non-negative matrix factorization model for speech emotion recognition
    Hou, Mixiao
    Li, Jinxing
    Lu, Guangming
    SPEECH COMMUNICATION, 2020, 124 : 13 - 20
  • [9] SPEECH EMOTION RECOGNITION USING TRANSFER NON-NEGATIVE MATRIX FACTORIZATION
    Song, Peng
    Ou, Shifeng
    Zheng, Wenming
    Jin, Yun
    Zhao, Li
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5180 - 5184
  • [10] Feature Weighted Non-Negative Matrix Factorization
    Chen, Mulin
    Gong, Maoguo
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (02) : 1093 - 1105