MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

被引:0
|
作者
Ma, Jian [1 ,3 ]
Wang, Wenguan [2 ]
Yang, Yi [2 ]
Zheng, Feng [1 ]
机构
[1] Southern Univ Sci & Technol, Shenzhen, Peoples R China
[2] Zhejiang Univ, ReLER, CCAI, Hangzhou, Peoples R China
[3] Univ Technol Sydney, ReLER, Ultimo, NSW, Australia
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sign language understanding has made significant strides; however, there is still no viable solution for generating sign sequences directly from entire spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users. In particular, a sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step. Moreover, by creating a joint embedding space for text, audio, and sign, we bind these modalities and leverage the semantic consistency among them to provide informative feedback for the model training. This embedding-consistency learning strategy minimizes the reliance on sign triplets and ensures continuous model refinement, even with a missing audio modality. Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
引用
收藏
页码:7241 / 7254
页数:14
相关论文
共 24 条
  • [1] A Data-Driven Representation for Sign Language Production
    Walsh, Harry
    Ravanshad, Abolfazl
    Rahmani, Mariam
    Bowden, Richard
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [2] A data-driven spoken language understanding system
    He, Y
    Young, S
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 583 - 588
  • [3] Deep JS']JSLC: A Multimodal Corpus Collection for Data-driven Generation of Japanese Sign Language Expressions
    Brock, Heike
    Nakadai, Kazuhiro
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4247 - 4252
  • [4] Data-driven development of Virtual Sign Language Communication Agents
    Brock, Heike
    Balayn, Agathe
    Nakadai, Kazuhiro
    2018 27TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN 2018), 2018, : 370 - 377
  • [5] MODEL-LEVEL DATA-DRIVEN SUB-UNITS FOR SIGNS IN VIDEOS OF CONTINUOUS SIGN LANGUAGE
    Theodorakis, Stavros
    Pitsikalis, Vassilis
    Maragos, Petros
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2262 - 2265
  • [6] Improving Spoken Language Outcomes for Children With Hearing Loss: Data-driven Instruction
    Douglas, Michael
    OTOLOGY & NEUROTOLOGY, 2016, 37 (02) : E13 - E19
  • [7] Natural language spoken interface control using data-driven semantic inference
    Bellegarda, JR
    Silverman, KEA
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (03): : 267 - 277
  • [8] A data-driven approach to the semantics of iconicity in American Sign Language and English
    Thompson, Bill
    Perlman, Marcus
    Lupyan, Gary
    Sevcikova Sehyr, Zed
    Emmorey, Karen
    LANGUAGE AND COGNITION, 2020, 12 (01) : 182 - 202
  • [9] Data-Driven Sub-Units and Modeling Structure for Continuous Sign Language Recognition with Multiple-Cues
    Pitsikalis, Vassilis
    Theodorakis, Stavros
    Maragos, Petros
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : A196 - A203
  • [10] Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language
    Al-Khazraji, Sedeeq
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 497 - 500