TOWARDS END-TO-END INTEGRATION OF DIALOG HISTORY FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING

被引:3
|
作者
Sunder, Vishal [1 ]
Thomas, Samuel [2 ]
Kuo, Hong-Kwang J. [2 ]
Ganhotra, Jatin [2 ]
Kingsbury, Brian [2 ]
Fosler-Lussier, Eric [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] IBM Res AI, Yorktown Hts, NY USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
spoken dialog system; spoken language understanding; end-to-end systems;
D O I
10.1109/ICASSP43922.2022.9747871
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Dialog history plays an important role in spoken language understanding (SLU) performance in a dialog system. For end-to-end (E2E) SLU, previous work has used dialog history in text form, which makes the model dependent on a cascaded automatic speech recognizer (ASR). This rescinds the benefits of an E2E system which is intended to be compact and robust to ASR errors. In this paper, we propose a hierarchical conversation model that is capable of directly using dialog history in speech form, making it fully E2E. We also distill semantic knowledge from the available gold conversation transcripts by jointly training a similar text-based conversation model with an explicit tying of acoustic and semantic embeddings. We also propose a novel technique that we call DropFrame to deal with the long training time incurred by adding dialog history in an E2E manner. On the HarperValleyBank dialog dataset, our E2E history integration outperforms a history independent baseline by 7.7% absolute F1 score on the task of dialog action recognition. Our model performs competitively with the state-of-the-art history based cascaded baseline, but uses 48% fewer parameters. In the absence of gold transcripts to fine-tune an ASR model, our model outperforms this baseline by a significant margin of 10% absolute F1 score.
引用
收藏
页码:7497 / 7501
页数:5
相关论文
共 50 条
  • [1] Integrating Dialog History into End-to-End Spoken Language Understanding Systems
    Ganhotra, Jatin
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Joshi, Sachindra
    Saon, George
    Tuske, Zoltan
    Kingsbury, Brian
    INTERSPEECH 2021, 2021, : 1254 - 1258
  • [2] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [3] DIALOGUE HISTORY INTEGRATION INTO END-TO-END SIGNAL-TO-CONCEPT SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
    Tomashenko, Natalia
    Raymond, Christian
    Caubriere, Antoine
    De Mori, Renato
    Esteve, Yannick
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8509 - 8513
  • [4] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [5] Semantic Complexity in End-to-End Spoken Language Understanding
    McKenna, Joseph P.
    Choudhary, Samridhi
    Saxon, Michael
    Strimel, Grant P.
    Mouchtaris, Athanasios
    INTERSPEECH 2020, 2020, : 4273 - 4277
  • [6] WhiSLU: End-to-End Spoken Language Understanding with Whisper
    Wang, Minghan
    Li, Yinglu
    Guo, Jiaxin
    Qiao, Xiaosong
    Li, Zongyao
    Shang, Hengchao
    Wei, Daimeng
    Tao, Shimin
    Zhang, Min
    Yang, Hao
    INTERSPEECH 2023, 2023, : 770 - 774
  • [7] End-to-End Spoken Language Understanding Without Full Transcripts
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Huang, Yinghui
    Audhkhasi, Kartik
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    Lastras, Luis
    INTERSPEECH 2020, 2020, : 906 - 910
  • [8] End-to-End Spoken Language Understanding for Generalized Voice Assistants
    Saxon, Michael
    Choudhary, Samridhi
    McKenna, Joseph P.
    Mouchtaris, Athanasios
    INTERSPEECH 2021, 2021, : 4738 - 4742
  • [9] Exploring Transfer Learning For End-to-End Spoken Language Understanding
    Rongali, Subendhu
    Liu, Beiye
    Cai, Liwei
    Arkoudas, Konstantine
    Su, Chengwei
    Hamza, Wael
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13754 - 13761
  • [10] End-to-End Neural Transformer Based Spoken Language Understanding
    Radfar, Martin
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    INTERSPEECH 2020, 2020, : 866 - 870