Sequence-to-Sequence Multi-Modal Speech In-Painting

被引:0
|
作者
Elyaderani, Mahsa Kadkhodaei [1 ]
Shirani, Shahram [1 ]
机构
[1] McMaster Univ, Dept Computat Sci & Engn, Hamilton, ON, Canada
来源
关键词
speech enhancement; speech in-painting; sequence-to-sequence models; multi-modality; Long Short-Term Memory networks; AUDIO; INTERPOLATION;
D O I
10.21437/Interspeech.2023-1848
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech in-painting is the task of regenerating missing audio contents using reliable context information. Despite various recent studies in multi-modal perception of audio in-painting, there is still a need for an effective infusion of visual and auditory information in speech in-painting. In this paper, we introduce a novel sequence-to-sequence model that leverages the visual information to in-paint audio signals via an encoder-decoder architecture. The encoder plays the role of a lip-reader for facial recordings and the decoder takes both encoder outputs as well as the distorted audio spectrograms to restore the original speech. Our model outperforms an audio-only speech inpainting model and has comparable results with a recent multimodal speech in-painter in terms of speech quality and intelligibility metrics for distortions of 300 ms to 1500 ms duration, which proves the effectiveness of the introduced multi-modality in speech in-painting.
引用
收藏
页码:829 / 833
页数:5
相关论文
共 50 条
  • [41] ON USING 2D SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Bahar, Parnia
    Zeyer, Albert
    Schlueter, Ralf
    Ney, Hermann
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5671 - 5675
  • [42] Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
    Chung, Yu-An
    Glass, James
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 811 - 815
  • [43] Attention Strategies for Multi-Source Sequence-to-Sequence Learning
    Libovicky, Jindrich
    Helcl, Jindrich
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 196 - 202
  • [44] MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING
    Cho, Jaejin
    Baskar, Murali Karthick
    Li, Ruizhi
    Wiesner, Matthew
    Mallidi, Sri Harish
    Yalta, Nelson
    Karafiat, Martin
    Watanabe, Shinji
    Hori, Takaaki
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 521 - 527
  • [45] Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
    Baskar, Murali Karthick
    Watanabe, Shinji
    Astudillo, Ramon
    Hori, Takaaki
    Burget, Lukas
    Cernocky, Jan
    INTERSPEECH 2019, 2019, : 3790 - 3794
  • [46] IMPROVING SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION TRAINING WITH ON-THE-FLY DATA AUGMENTATION
    Nguyen, Thai-Son
    Stuker, Sebastian
    Niehues, Jan
    Waibel, Alex
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7689 - 7693
  • [47] CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Qiu, David
    Zhang, Yu
    Li, Bo
    He, Yanzhang
    Woodland, Philip C.
    Cao, Liangliang
    Strohman, Trevor
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6388 - 6392
  • [48] Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning
    Mohan, Devang S. Ram
    Lenain, Raphael
    Foglianti, Lorenzo
    Teh, Tian Huey
    Staib, Marlene
    Torresquintero, Alexandra
    Gao, Jiameng
    INTERSPEECH 2020, 2020, : 3186 - 3190
  • [49] Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
    Zhou, Shiyu
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 791 - 795
  • [50] A NEW SEQUENCE-TO-SEQUENCE TRANSFORMATION
    CLARK, WD
    GRAY, HL
    SIAM REVIEW, 1969, 11 (04) : 648 - &