Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引:0
|
作者
Sakuma, Jin [1 ]
Fujie, Shinya [1 ,2 ]
Zhao, Huaibo [1 ]
Kobayashi, Tetsunori [1 ]
机构
[1] Waseda Univ, Tokyo, Japan
[2] Chiba Inst Technol, Chiba, Japan
来源
关键词
spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;
D O I
10.21437/Interspeech.2023-1618
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.
引用
收藏
页码:2668 / 2672
页数:5
相关论文
共 50 条
  • [11] Improving Named Entity Recognition in Spoken Dialog Systems by Context and Speech Pattern Modeling
    Minh Nguyen
    Yu, Zhou
    SIGDIAL 2021: 22ND ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2021), 2021, : 45 - 55
  • [12] RESPONSE TIMING ESTIMATION FOR SPOKEN DIALOG SYSTEMS BASED ON SYNTACTIC COMPLETENESS PREDICTION
    Sakuma, Jin
    Fujie, Shinya
    Kobayashi, Tetsunori
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 369 - 374
  • [13] Improving Long Distance Slot Carryover in Spoken Dialogue Systems
    Chen, Tongfei
    Naik, Chetan
    He, Hua
    Rastogi, Pushpendre
    Mathias, Lambert
    NLP FOR CONVERSATIONAL AI, 2019, : 96 - 105
  • [14] Recognition of Paralinguistic Information in Spoken Dialogue Systems for Elderly People
    Perez-Espinosa, Humberto
    Martinez-Miranda, Juan
    ADVANCES IN ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, MICAI 2015, PT I, 2015, 9413 : 107 - 117
  • [15] Enhancement of Spoken Dialogue Systems by Means of User Emotion Recognition
    Lopez-Cozar, Ramon
    Silovsky, Jan
    Griol, David
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 191 - 198
  • [16] Age Recognition for Spoken Dialogue Systems: Do We Need It?
    Wolters, Maria
    Vipperla, Ravichander
    Renals, Steve
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1435 - 1438
  • [17] Evaluating speech recognition in the context of a spoken dialogue system: Critical error rate
    Damnati, G
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 280 - 283
  • [18] The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems
    Liesenfeld, Andreas
    Lopez, Alianda
    Dingemanse, Mark
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 482 - 495
  • [19] A Framework of Reply Speech Generation for Concept-to-Speech Conversion in Spoken Dialogue Systems
    Takada, Seiya
    Yagi, Yuji
    Hirose, Keikichi
    Minematsu, Nobuaki
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 677 - +
  • [20] Reinforcement learning for parameter estimation in statistical spoken dialogue systems
    Jurcicek, Filip
    Thomson, Blaise
    Young, Steve
    COMPUTER SPEECH AND LANGUAGE, 2012, 26 (03): : 168 - 192