EARLY FUSION OF SPARSE CLASSIFICATION AND GMM FOR NOISE ROBUST ASR
被引:0
作者:
Sun, Yang
论文数: 0引用数: 0
h-index: 0
机构:
Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, NetherlandsRadboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, Netherlands
Sun, Yang
[1
]
Gemmeke, Jort F.
论文数: 0引用数: 0
h-index: 0
机构:
Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, NetherlandsRadboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, Netherlands
Gemmeke, Jort F.
[1
]
Cranen, Bert
论文数: 0引用数: 0
h-index: 0
机构:
Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, NetherlandsRadboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, Netherlands
Cranen, Bert
[1
]
ten Bosch, Louis
论文数: 0引用数: 0
h-index: 0
机构:
Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, NetherlandsRadboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, Netherlands
ten Bosch, Louis
[1
]
Boves, Lou
论文数: 0引用数: 0
h-index: 0
机构:
Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, NetherlandsRadboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, Netherlands
Boves, Lou
[1
]
机构:
[1] Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6525 ED Nijmegen, Netherlands
来源:
19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011)
|
2011年
关键词:
SPEECH RECOGNITION;
D O I:
暂无
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
In previous work we have shown that an ASR system consisting of a dual-input DBN which simultaneously observes MFCC acoustic features and predicted phone labels that are generated by an exemplar-based Sparse Classification (SC) system can achieve better word recognition accuracies in noise than a system observing only one of those input streams. This paper explores two modifications of the SC input to further improve the noise robustness of the dual-input DBN system: 1) integrating more time context and 2) using N best states. Experiments on AURORA-2 reveal that the first approach significantly improves the recognition results at almost all SNRs, but particularly in the more noisy conditions, achieving up to 6.1% (absolute) accuracy gain at SNR -5 dB. The second modification shows that there is an optimal N which allows the maximum attainable accuracy to be even further improved with another 11.8% at -5 dB.