Improving the response timing estimation for spoken dialogue systems by reducing the effect of speech recognition delay

被引:0
|
作者
Sakuma, Jin [1 ]
Fujie, Shinya [1 ,2 ]
Zhao, Huaibo [1 ]
Kobayashi, Tetsunori [1 ]
机构
[1] Waseda Univ, Tokyo, Japan
[2] Chiba Inst Technol, Chiba, Japan
来源
关键词
spoken dialog systems; turn-taking; response timing; streaming ASR; TURN-TAKING;
D O I
10.21437/Interspeech.2023-1618
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In conversational systems, the proper timing of the system's response is critical to maintaining a comfortable conversation. To achieve appropriate timing estimation, it is important to know what the users have said, including their most recent words, but ASR delay usually prevents the use of full user utterance. In this paper, we attempted to employ an extremely low latency ASR model called Multi-Look-Ahead ASR by Zhao et al. to enable near full utterance for response timing estimation. Additionally, we examined the effectiveness of using low latency ASR in combination with a parameter called Estimates of Syntactic Completeness (ESC), which indicates how soon the user's speech is completed. We evaluated on a Japanese simulated dialog database of a restaurant information center. The results confirmed that reducing ASR delay improves the accuracy of response timing estimation. This effect also appeared when the method using ESC is combined with the use of low latency ASR.
引用
收藏
页码:2668 / 2672
页数:5
相关论文
共 50 条
  • [31] Improving Identification Accuracy by Extending Acceptable Utterances in Spoken Dialogue System Using Barge-in Timing
    Matsuyama, Kyoko
    Komatani, Kazunori
    Takahashi, Toru
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT II, PROCEEDINGS, 2010, 6097 : 585 - 594
  • [32] Timing and delay spread estimation scheme in OFDM systems
    Wen, Jyh-Horng
    Lee, Shu-Hong
    Lee, Gwo-Ruey
    Chang, Jin-Tong
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (02) : 316 - 320
  • [33] Using Dialogue-Based Dynamic Language Models for Improving Speech Recognition
    Manuel Lucas-Cuesta, Juan
    Fernandez, Fernando
    Ferreiros, Javier
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2439 - 2442
  • [34] Gentamicin labels: improving the timing of gentamicin levels and reducing delay in dose administrations
    Zakariya, Muhammad Zakwan
    Palenzuela, Esperanza
    Sutherland, Rebekah
    Lockman, Khalida Ann
    JAC-ANTIMICROBIAL RESISTANCE, 2023, 5 (SUPP_2):
  • [35] Automatic Speech Recognition Based on Multiple Level Units in Spoken Dialogue System for In-Vehicle Appliances
    Nishida, Masafumi
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    Ichikawa, Akira
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 539 - +
  • [36] Incorporating discourse features into confidence scoring of intention recognition results in spoken dialogue systems
    Higashinaka, R
    Sudoh, K
    Nakano, M
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 25 - 28
  • [37] Incorporating discourse features into confidence scoring of intention recognition results in spoken dialogue systems
    Higashinaka, R
    Sudoh, K
    Nakano, M
    SPEECH COMMUNICATION, 2006, 48 (3-4) : 417 - 436
  • [38] Improving the accuracy of speech recognition systems for professional translators
    Ludovik, Y
    Zacharski, R
    NATURAL LANGUAGE PROCESSING-NLP 2000, PROCEEDINGS, 2000, 1835 : 293 - 303
  • [39] Enhanced Phone Posteriors for Improving Speech Recognition Systems
    Ketabdar, Hamed
    Bourlard, Herve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1094 - 1106
  • [40] A RERANKING APPROACH FOR RECOGNITION AND CLASSIFICATION OF SPEECH INPUT IN CONVERSATIONAL DIALOGUE SYSTEMS
    Morbini, Fabrizio
    Audhkhasi, Kartik
    Artstein, Ron
    Van Segbroeck, Maarten
    Sagae, Kenji
    Georgiou, Panayiotis
    Traum, David R.
    Narayanan, Shri
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 49 - 54