Graceful degradation of speech recognition performance over packet-erasure networks

被引:24
作者
Boulis, C [1 ]
Ostendorf, M [1 ]
Riskin, EA [1 ]
Otterson, S [1 ]
机构
[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2002年 / 10卷 / 08期
基金
美国国家科学基金会;
关键词
bit allocation; forward error correction; packet loss; speech recognition; unequal loss protection;
D O I
10.1109/TSA.2002.804532
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper explores packet loss recovery for automatic speech recognition (ASR) in spoken dialog systems, assuming an architecture in which a lightweight client communicates with a remote ASR server. Speech is transmitted with source and channel codes optimized for the ASR application, i.e., to minimize word error rate. Unequal amounts of forward error correction, depending on the data's effect on ASR performance, are assigned to protect against packet loss. Experiments with simulated packet loss in a range of loss conditions are conducted on the DARPA Communicator (air travel information) task. Results show that the approach provides robust ASR performance which degrades gracefully as packet loss rates increase. Transmitting at 5.2 Kbp s with tip to 200 ms added delay, leads to only a 7% relative degradation in word error rate even under extremely adverse network conditions.
引用
收藏
页码:580 / 590
页数:11
相关论文
共 43 条
[1]  
[Anonymous], P IEEE ICASSP SALT L
[2]   Multiple description perceptual audio coding with correlating transforms [J].
Arean, R ;
Kovacevic, J ;
Goyal, VK .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (02) :140-145
[3]  
BERNARD A, 2001, P ICASSP, V4, P2613
[4]  
BERNARD A, 2001, P EUROSPEECH, V4, P2704
[5]  
BESACIER L, 2000, P ICASSP, V2, P1085
[6]   Adaptive FEC-based error control for Internet telephony [J].
Bolot, JC ;
Fosse-Parisis, S ;
Towsley, D .
IEEE INFOCOM '99 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS: THE FUTURE IS NOW, 1999, :1453-1460
[7]  
BOYCE JM, 1998, ACM MULTIMEDIA, P181
[8]   SPEECH CODING BASED UPON VECTOR QUANTIZATION [J].
BUZO, A ;
GRAY, AH ;
GRAY, RM ;
MARKEL, JD .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (05) :562-574
[9]  
CHAZAN D, 2000, P EUR SIGNAL PROCESS
[10]   OPTIMAL PRUNING WITH APPLICATIONS TO TREE-STRUCTURED SOURCE-CODING AND MODELING [J].
CHOU, PA ;
LOOKABAUGH, T ;
GRAY, RM .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1989, 35 (02) :299-315