A hybrid barge-in procedure for more reliable turn-taking in human-machine dialog systems

被引:10
作者
Rose, RC [1 ]
Kim, HK [1 ]
机构
[1] AT&T Labs Res, Florham Pk, NJ 07932 USA
来源
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 | 2003年
关键词
D O I
10.1109/ASRU.2003.1318428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates techniques designed to allow the users of human-machine dialog systems to interrupt or barge-in over machine generated speech messages. An experimental study was performed on utterances collected from a telephone based dialog system to analyze the effect of barge-in performance on users' speech. One result of this study was that excessive barge-in latencies resulted in disfluencies appearing in over half of users' utterances. A hybrid procedure for barge-in detection is proposed and evaluated on the utterances collected from the same domain. The procedure combines a feature-based voice activity detection (VAD) algorithm with a model-based approach for verifying hypothesized speech segments. The procedure is shown in the paper to obtain better detection performance than procedures that rely on the speech recognition decoder to detect speech. It is also found to have latencies that are comparable to those obtained by low delay feature-based speech detection algorithms.
引用
收藏
页码:198 / 203
页数:6
相关论文
共 8 条
[1]  
BALENTINE B, 1999, BUILD SPEECH RECOGNI
[2]  
*ETSI TS, 126094200103 ETSI TS
[3]  
JOHNSTONE A, 1994, INT J HUMAN COMPUTER, V41, P383
[4]  
RAHIM M, 2000, P INT C SPOK LANG PR
[5]  
SAON G, 2000, P INT C AC SPEECH SI
[6]  
Saraclar M., 2002, P INT C SPOK LANG PR
[7]  
Setlur AR, 1998, P INT C SPOK LANG PR, P2135
[8]  
Strom N., 2000, P ICSLP, P652