A Comparative Study on the Effect of Different Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling Techniques

被引：0

作者：

Raghavan, Srinivasa ^{[1
]}

Meenakshi, Nisha ^{[1
]}

Mittal, Sanjeev Kumar ^{[1
]}

Yarra, Chiranjeevi ^{[1
]}

Mandal, Anupam ^{[2
]}

Kumar, K. R. Prasanna ^{[2
]}

Ghosh, Prasanta Kumar ^{[1
]}

机构：

[1] Indian Inst Sci IISc, Elect Engn, Bangalore 560012, Karnataka, India

[2] Ctr AI & Robot, Bangalore 560093, Karnataka, India

来源：

2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2017年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this work, we study the effect of codec induced distortion on the speech recognition performance in the TIMIT corpus using eleven codecs and five acoustic modeling techniques (AMTs) including several state-of-the-art methods. This study is performed in a single round of encoding-decoding and various tandem scenarios. Experiments from the single encoding-decoding case reveal that the acoustic models from G.711A, a narrowband high bit rate codec yields lower phone error rate (PER) compared to low bit rate codecs for most AMTs. It is observed that among the eleven codecs based acoustic models, G.711A, G.728, G.729B, AMR-WB and G.729A codecs consistently result in the least five PERs across AMTs. It is found that the model trained on 'clean' speech data (PCM) performs poorly in three of the five AMTs compared to these five codec based acoustic models. These five models are then used in six different tandem scenarios comprising three unseen codecs. Similar to the single round of encoding-decoding case, the PER for each of the tandem scenarios turns out to be the lowest consistently for all AMTs when the acoustic model from the G.711A codec is used. However, when the acoustic model is trained with mixed speech data from all tandem scenarios, the PER is found to perform better than the matched condition in the case of four out of five AMTs.

引用

页数：6

共 30 条

[1] 3rd Generation Partnership Project ( 3GPP), ANSI C COD AD MULT W
[2] [Anonymous], 2006, INTERSPEECH
[3] [Anonymous], 2015, P 2015 INT JOINT C N
[4] [Anonymous], THESIS
[5] [Anonymous], P IEEE ASRU IEEE SIG
[6] Digalakis V., P IEEE INT C AC SPEE, V1, P101
[7] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[8] Hirsch H.-G., 7 INT C SPOK LANG PR
[9] KIM HK, 2000, P ICASSP MAY, P1607
[10] SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS
LEE, KF
HON, HW
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11): : 1641 - 1648

← 1 2 3 →