Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引:0
|
作者
Sakuma, Jin [1 ]
Fujie, Shinya [1 ,2 ]
Zhao, Huaibo [1 ]
Kobayashi, Tetsunori [1 ]
机构
[1] Waseda Univ, Tokyo, Japan
[2] Chiba Inst Technol, Chiba, Japan
来源
关键词
spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;
D O I
10.21437/Interspeech.2023-1618
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.
引用
收藏
页码:2668 / 2672
页数:5
相关论文
共 50 条
  • [1] Improving Impressions of Response Delay in AI-based Spoken Dialogue Systems
    Asaka, Shuhei
    Itoyama, Katsutoshi
    Nakadai, Kazuhiro
    2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, 2024, : 1416 - 1421
  • [2] SPONTANEOUS SPEECH RECOGNITION FOR ROMANIAN IN SPOKEN DIALOGUE SYSTEMS
    Burileanu, Corneliu
    Popescu, Vladimir
    Buzo, Andi
    Petrea, Cristina Sorina
    Ghelmez-Hanes, Diana
    PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2010, 11 (01): : 83 - 91
  • [3] On Appropriateness and Estimation of the Emotion of Synthesized Response Speech in a Spoken Dialogue System
    Kase, Taketo
    Nose, Takashi
    Ito, Akinori
    HCI INTERNATIONAL 2015 - POSTERS' EXTENDED ABSTRACTS, PT I, 2015, 528 : 747 - 752
  • [4] Effects of speech recognition accuracy on the performance of DARPA communicator spoken dialogue systems
    Sanders G.A.
    Le A.N.
    International Journal of Speech Technology, 2004, 7 (4) : 293 - 309
  • [5] Two-level speech recognition to enhance the performance of spoken dialogue systems
    Lopez-Cozar, Ramon
    Callejas, Zoraida
    KNOWLEDGE-BASED SYSTEMS, 2006, 19 (03) : 153 - 163
  • [6] The utility of semantic-pragmatic information and dialogue-state for speech recognition in spoken dialogue systems
    Stemmer, G
    Nöth, E
    Niemann, H
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 439 - 444
  • [7] Emotion recognition and adaptation in spoken dialogue systems
    Pittermann, Johannes
    Pittermann, Angela
    Minker, Wolfgang
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2010, 13 (01) : 49 - 60
  • [8] Predicting and adapting to poor speech recognition in a spoken dialogue system
    Litman, DJ
    Pan, S
    SEVENTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-2001) / TWELFTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-2000), 2000, : 722 - 728
  • [9] Reducing working memory load in spoken dialogue systems
    Wolters, Maria
    Georgila, Kallirroi
    Moore, Johanna D.
    Logie, Robert H.
    MacPherson, Sarah E.
    Watson, Matthew
    INTERACTING WITH COMPUTERS, 2009, 21 (04) : 276 - 287
  • [10] YEAH RIGHT: SARCASM RECOGNITION FOR SPOKEN DIALOGUE SYSTEMS
    Tepperman, Joseph
    Traum, David
    Narayanan, Shrikanth
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1838 - +