Semantic Complexity in End-to-End Spoken Language Understanding

被引:7
|
作者
McKenna, Joseph P. [1 ]
Choudhary, Samridhi [1 ]
Saxon, Michael [1 ]
Strimel, Grant P. [1 ]
Mouchtaris, Athanasios [1 ]
机构
[1] Amazon, Alexa Machine Learning, Seattle, WA 98109 USA
来源
关键词
spoken language understanding; semantic complexity; speech-to-interpretation; NETWORKS;
D O I
10.21437/Interspeech.2020-2929
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these models generalize to broader use cases. In this work, we analyze the relationship between the performance of STI models and the difficulty of the use case to which they are applied. We introduce empirical measures of dataset semantic complexity to quantify the difficulty of the SLU tasks. We show that near-perfect performance metrics for STI models reported in the literature were obtained with datasets that have low semantic complexity values. We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease. Our results show that it is important to contextualize an STI model's performance with the complexity values of its training dataset to reveal the scope of its applicability.
引用
收藏
页码:4273 / 4277
页数:5
相关论文
共 50 条
  • [41] The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation
    He, Mutian
    Garner, Philip N.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4408 - 4423
  • [42] End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding
    Chen, Yun-Nung
    Hakkani-Tur, Dilek
    Tur, Gokhan
    Gao, Jianfeng
    Deng, Li
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3245 - 3249
  • [43] Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
    Denisov, Pavel
    Vu, Ngoc Thang
    INTERSPEECH 2020, 2020, : 881 - 885
  • [44] Analysis of Acoustic information in End-to-End Spoken Language Translation
    Sant, Gerard
    Escolano, Carlos
    INTERSPEECH 2023, 2023, : 52 - 56
  • [45] EFFICIENT USE OF END-TO-END DATA IN SPOKEN LANGUAGE PROCESSING
    Lu, Yiting
    Wang, Yu
    Gales, Mark J. F.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7518 - 7522
  • [46] ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding
    Sunder, Vishal
    Fosler-Lussier, Eric
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Kingsbury, Brian
    INTERSPEECH 2023, 2023, : 1129 - 1133
  • [47] Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability
    Caubriere, Antoine
    Tomashenko, Natalia
    Laurent, Antoine
    Morin, Emmanuel
    Camelin, Nathalie
    Esteve, Yannick
    INTERSPEECH 2019, 2019, : 1198 - 1202
  • [48] Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding
    Cappellazzo, Umberto
    Yang, Muqiao
    Falavigna, Daniele
    Brutti, Alessio
    INTERSPEECH 2023, 2023, : 2953 - 2957
  • [49] NON-AUTOREGRESSIVE END-TO-END APPROACHES FOR JOINT AUTOMATIC SPEECH RECOGNITION AND SPOKEN LANGUAGE UNDERSTANDING
    Li, Mohan
    Doddipatla, Rama
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 390 - 397
  • [50] End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
    Desot, Thierry
    Portet, Francois
    Vacher, Michel
    COMPUTER SPEECH AND LANGUAGE, 2022, 75