Generative Spoken Dialogue Language Modeling

被引:12
|
作者
Nguyen, Tu Anh [1 ,2 ]
Kharitonov, Eugene [1 ,3 ]
Copet, Jade [1 ]
Adi, Yossi
Hsu, Wei-Ning [4 ]
Elkahky, Ali [4 ]
Tomasello, Paden [4 ]
Algayres, Robin [1 ]
Sagot, Benoit [2 ]
Mohamed, Abdelrahman [1 ]
Dupoux, Emmanuel [1 ,5 ]
机构
[1] Meta AI Res, Paris, France
[2] Inria, Paris, France
[3] Meta Res, Tel Aviv, France
[4] Meta Res, New York, NY USA
[5] EHESS, ENS PSL, CNRS, Paris, France
关键词
TURN-TAKING; ORGANIZATION;
D O I
10.1162/tacl_a_00545
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce dGSLM, the first "textless " model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter, and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn taking compared to a text-based cascaded model.(1),(2)
引用
收藏
页码:250 / 266
页数:17
相关论文
共 50 条
  • [1] Sequential Dialogue Context Modeling for Spoken Language Understanding
    Bapna, Ankur
    Tur, Gokhan
    Hakkani-Tur, Dilek
    Heck, Larry
    18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), 2017, : 103 - 114
  • [2] On Generative Spoken Language Modeling from Raw Audio
    Lakhotia, Kushal
    Kharitonov, Eugene
    Hsu, Wei-Ning
    Adi, Yossi
    Polyak, Adam
    Bolte, Benjamin
    Tu-Anh Nguyen
    Copet, Jade
    Baevski, Alexei
    Mohamed, Abdelrahman
    Dupoux, Emmanuel
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1336 - 1354
  • [3] Spoken language dialogue systems
    Giachin, E
    McGlashan, S
    CORPUS-BASED METHODS IN LANGUAGE AND SPEECH PROCESSING, 1997, 2 : 69 - 117
  • [4] The SENECA spoken language dialogue system
    Minker, W
    Haiber, U
    Heisterkamp, P
    Scheible, S
    SPEECH COMMUNICATION, 2004, 43 (1-2) : 89 - 102
  • [5] Text-Free Prosody-Aware Generative Spoken Language Modeling
    Kharitonov, Eugene
    Lee, Ann
    Polyak, Adam
    Adi, Yossi
    Copet, Jade
    Lakhotia, Kushal
    Tu-Anh Nguyen
    Riviere, Morgane
    Mohamed, Abdelrahman
    Dupoux, Emmanuel
    Wei-Ning Hsu
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8666 - 8681
  • [6] Modeling of spoken dialogue control for improvement of dialogue efficiency
    Kikuchi, H
    Shirai, K
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 781 - 785
  • [7] Evaluation of spoken language understanding and dialogue systems
    Hildebrandt, B
    Rautenstrauch, H
    Sagerer, G
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 685 - 688
  • [8] Robust numeric recognition in spoken language dialogue
    Rahim, M
    Riccardi, G
    Saul, L
    Wright, J
    Buntschuh, B
    Gorin, A
    SPEECH COMMUNICATION, 2001, 34 (1-2) : 195 - 212
  • [9] Generative and Discriminative Algorithms for Spoken Language Understanding
    Raymond, Christian
    Riccardi, Giuseppe
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 413 - 416
  • [10] Adaptive language models for spoken dialogue systems
    Solsona, RA
    Fosler-Lussier, E
    Kuo, HKJ
    Potamianos, A
    Zitouni, I
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 37 - 40