Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引：0

作者：

Sakuma, Jin ^{[1
]}

Fujie, Shinya ^{[1
,2
]}

Zhao, Huaibo ^{[1
]}

Kobayashi, Tetsunori ^{[1
]}

机构：

[1] Waseda Univ, Tokyo, Japan

[2] Chiba Inst Technol, Chiba, Japan

来源：

INTERSPEECH 2023 | 2023年

关键词：

spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;

D O I：

10.21437/Interspeech.2023-1618

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.

引用

页码：2668 / 2672

页数：5

共 50 条

[1] Improving Impressions of Response Delay in AI-based Spoken Dialogue Systems
Asaka, Shuhei
Itoyama, Katsutoshi
Nakadai, Kazuhiro
2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, 2024, : 1416 - 1421
[2] SPONTANEOUS SPEECH RECOGNITION FOR ROMANIAN IN SPOKEN DIALOGUE SYSTEMS
Burileanu, Corneliu
Popescu, Vladimir
Buzo, Andi
Petrea, Cristina Sorina
Ghelmez-Hanes, Diana
PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2010, 11 (01): : 83 - 91
[3] On Appropriateness and Estimation of the Emotion of Synthesized Response Speech in a Spoken Dialogue System
Kase, Taketo
Nose, Takashi
Ito, Akinori
HCI INTERNATIONAL 2015 - POSTERS' EXTENDED ABSTRACTS, PT I, 2015, 528 : 747 - 752
[4] Effects of speech recognition accuracy on the performance of DARPA communicator spoken dialogue systems
Sanders G.A.
Le A.N.
International Journal of Speech Technology, 2004, 7 (4) : 293 - 309
[5] Two-level speech recognition to enhance the performance of spoken dialogue systems
Lopez-Cozar, Ramon
Callejas, Zoraida
KNOWLEDGE-BASED SYSTEMS, 2006, 19 (03) : 153 - 163
[6] The utility of semantic-pragmatic information and dialogue-state for speech recognition in spoken dialogue systems
Stemmer, G
Nöth, E
Niemann, H
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 439 - 444
[7] Emotion recognition and adaptation in spoken dialogue systems
Pittermann, Johannes
Pittermann, Angela
Minker, Wolfgang
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2010, 13 (01) : 49 - 60
[8] Predicting and adapting to poor speech recognition in a spoken dialogue system
Litman, DJ
Pan, S
SEVENTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-2001) / TWELFTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-2000), 2000, : 722 - 728
[9] Reducing working memory load in spoken dialogue systems
Wolters, Maria
Georgila, Kallirroi
Moore, Johanna D.
Logie, Robert H.
MacPherson, Sarah E.
Watson, Matthew
INTERACTING WITH COMPUTERS, 2009, 21 (04) : 276 - 287
[10] YEAH RIGHT: SARCASM RECOGNITION FOR SPOKEN DIALOGUE SYSTEMS
Tepperman, Joseph
Traum, David
Narayanan, Shrikanth
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1838 - +

← 1 2 3 4 5 →