Semantic Complexity in End-to-End Spoken Language Understanding

被引:7
|
作者
McKenna, Joseph P. [1 ]
Choudhary, Samridhi [1 ]
Saxon, Michael [1 ]
Strimel, Grant P. [1 ]
Mouchtaris, Athanasios [1 ]
机构
[1] Amazon, Alexa Machine Learning, Seattle, WA 98109 USA
来源
关键词
spoken language understanding; semantic complexity; speech-to-interpretation; NETWORKS;
D O I
10.21437/Interspeech.2020-2929
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these models generalize to broader use cases. In this work, we analyze the relationship between the performance of STI models and the difficulty of the use case to which they are applied. We introduce empirical measures of dataset semantic complexity to quantify the difficulty of the SLU tasks. We show that near-perfect performance metrics for STI models reported in the literature were obtained with datasets that have low semantic complexity values. We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease. Our results show that it is important to contextualize an STI model's performance with the complexity values of its training dataset to reveal the scope of its applicability.
引用
收藏
页码:4273 / 4277
页数:5
相关论文
共 50 条
  • [21] Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
    Kim, Suyoun
    Shrivastava, Akshat
    Duc Le
    Lin, Ju
    Kalinli, Ozlem
    Seltzer, Michael L.
    INTERSPEECH 2023, 2023, : 1119 - 1123
  • [22] USING SPEECH SYNTHESIS TO TRAIN END-TO-END SPOKEN LANGUAGE UNDERSTANDING MODELS
    Lugosch, Loren
    Meyer, Brett H.
    Nowrouzezahrai, Derek
    Ravanelli, Mirco
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8499 - 8503
  • [23] Two-Pass Low Latency End-to-End Spoken Language Understanding
    Arora, Siddhant
    Dalmia, Siddharth
    Chang, Xuankai
    Yan, Brian
    Black, Alan
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3478 - 3482
  • [24] Low-bit Shift Network for End-to-End Spoken Language Understanding
    Avila, Anderson R.
    Bibi, Khalil
    Yang, Ruiheng
    Li, Xinlin
    Xing, Chao
    Chen, Xiao
    INTERSPEECH 2022, 2022, : 2698 - 2702
  • [25] Confidence measure for speech-to-concept end-to-end spoken language understanding
    Caubriere, Antoine
    Esteve, Yannick
    Laurent, Antoine
    Morin, Emmanuel
    INTERSPEECH 2020, 2020, : 1590 - 1594
  • [26] Speech Model Pre-training for End-to-End Spoken Language Understanding
    Lugosch, Loren
    Ravanelli, Mirco
    Ignoto, Patrick
    Tomar, Vikrant Singh
    Bengio, Yoshua
    INTERSPEECH 2019, 2019, : 814 - 818
  • [27] TOWARDS END-TO-END INTEGRATION OF DIALOG HISTORY FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING
    Sunder, Vishal
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Ganhotra, Jatin
    Kingsbury, Brian
    Fosler-Lussier, Eric
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7497 - 7501
  • [28] Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language Understanding
    Huang, Lingyan
    Li, Tao
    Zhou, Haodong
    Hong, Qingyang
    Li, Lin
    INTERSPEECH 2023, 2023, : 1124 - 1128
  • [29] END-TO-END SPOKEN LANGUAGE UNDERSTANDING WITHOUT MATCHED LANGUAGE SPEECH MODEL PRETRAINING DATA
    Price, Ryan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7979 - 7983
  • [30] ATTENTIVE CONTEXTUAL CARRYOVER FOR MULTI-TURN END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Wei, Kai
    Tran, Thanh
    Chang, Feng-Ju
    Sathyendra, Kanthashree Mysore
    Muniyappa, Thejaswi
    Hu, Jing
    Raju, Anirudh
    McGowan, Ross
    Susanj, Nathan
    Rastrow, Ariya
    Strimel, Grant P.
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 837 - 844