Generative Spoken Dialogue Language Modeling

被引:12
|
作者
Nguyen, Tu Anh [1 ,2 ]
Kharitonov, Eugene [1 ,3 ]
Copet, Jade [1 ]
Adi, Yossi
Hsu, Wei-Ning [4 ]
Elkahky, Ali [4 ]
Tomasello, Paden [4 ]
Algayres, Robin [1 ]
Sagot, Benoit [2 ]
Mohamed, Abdelrahman [1 ]
Dupoux, Emmanuel [1 ,5 ]
机构
[1] Meta AI Res, Paris, France
[2] Inria, Paris, France
[3] Meta Res, Tel Aviv, France
[4] Meta Res, New York, NY USA
[5] EHESS, ENS PSL, CNRS, Paris, France
关键词
TURN-TAKING; ORGANIZATION;
D O I
10.1162/tacl_a_00545
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce dGSLM, the first "textless " model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter, and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn taking compared to a text-based cascaded model.(1),(2)
引用
收藏
页码:250 / 266
页数:17
相关论文
共 50 条
  • [21] Evaluation and usability of multimodal spoken language dialogue systems
    Dybkjær, L
    Bernsen, NO
    Minker, W
    SPEECH COMMUNICATION, 2004, 43 (1-2) : 33 - 54
  • [22] The integration of the Hungarian language in to the Slovak Spoken dialogue system
    Ondas, Stanislav
    Juhar, Jozef
    Papco, Marek
    Trnka, Marian
    Kiraly, Vojtech
    PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIGNALS, SPEECH AND IMAGE PROCESSING/9TH WSEAS INTERNATIONAL CONFERENCE ON MULTIMEDIA, INTERNET & VIDEO TECHNOLOGIES, 2009, : 102 - +
  • [23] SCALABLE LANGUAGE MODEL ADAPTATION FOR SPOKEN DIALOGUE SYSTEMS
    Gandhe, Ankur
    Rastrow, Ariya
    Hoffmeister, Bjorn
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 907 - 912
  • [24] Deep Contextual Language Understanding in Spoken Dialogue Systems
    Liu, Chunxi
    Xu, Puyang
    Sarikaya, Ruhi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 120 - 124
  • [25] JOINT GENERATIVE AND DISCRIMINATIVE MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
    Dinarelli, Marco
    Moschitti, Alessandro
    Riccardi, Giuseppe
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 61 - 64
  • [26] The role of spoken language dialogue interaction in intelligent environments
    Minker, Wolfgang
    Lopez-Cozar, Ramon
    McTear, Michael
    JOURNAL OF AMBIENT INTELLIGENCE AND SMART ENVIRONMENTS, 2009, 1 (01) : 31 - 36
  • [27] Automatic Spoken Language Acquisition Based on Observation and Dialogue
    Komatsu, Ryota
    Gao, Shengzhou
    Hou, Wenxin
    Zhang, Mingxin
    Tanaka, Tomohiro
    Toyoda, Keisuke
    Kimura, Yusuke
    Hino, Kent
    Iwamoto, Yu
    Mori, Kosuke
    Okamoto, Takuma
    Shinozaki, Takahiro
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1480 - 1492
  • [28] Endowing spoken language dialogue systems with emotional intelligence
    André, E
    Rehm, M
    Minker, W
    Bühler, D
    AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS, 2004, 3068 : 178 - 187
  • [29] A Spoken Language Interpretation Component for a Robot Dialogue System
    Makalic, Enes
    Zukerman, Ingrid
    Niemann, Michael
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 195 - 198
  • [30] A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts
    Kafle, Sushant
    Huenerfauth, Matt
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 99 - 103