Sequence-to-Sequence Multi-Modal Speech In-Painting

被引：0

作者：

Elyaderani, Mahsa Kadkhodaei ^{[1
]}

Shirani, Shahram ^{[1
]}

机构：

[1] McMaster Univ, Dept Computat Sci & Engn, Hamilton, ON, Canada

来源：

INTERSPEECH 2023 | 2023年

关键词：

speech enhancement; speech in-painting; sequence-to-sequence models; multi-modality; Long Short-Term Memory networks; AUDIO; INTERPOLATION;

D O I：

10.21437/Interspeech.2023-1848

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech in-painting is the task of regenerating missing audio contents using reliable context information. Despite various recent studies in multi-modal perception of audio in-painting, there is still a need for an effective infusion of visual and auditory information in speech in-painting. In this paper, we introduce a novel sequence-to-sequence model that leverages the visual information to in-paint audio signals via an encoder-decoder architecture. The encoder plays the role of a lip-reader for facial recordings and the decoder takes both encoder outputs as well as the distorted audio spectrograms to restore the original speech. Our model outperforms an audio-only speech inpainting model and has comparable results with a recent multimodal speech in-painter in terms of speech quality and intelligibility metrics for distortions of 300 ms to 1500 ms duration, which proves the effectiveness of the introduced multi-modality in speech in-painting.

引用

页码：829 / 833

页数：5

共 50 条

[41] ON USING 2D SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Bahar, Parnia
Zeyer, Albert
Schlueter, Ralf
Ney, Hermann
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5671 - 5675
[42] Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
Chung, Yu-An
Glass, James
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 811 - 815
[43] Attention Strategies for Multi-Source Sequence-to-Sequence Learning
Libovicky, Jindrich
Helcl, Jindrich
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 196 - 202
[44] MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING
Cho, Jaejin
Baskar, Murali Karthick
Li, Ruizhi
Wiesner, Matthew
Mallidi, Sri Harish
Yalta, Nelson
Karafiat, Martin
Watanabe, Shinji
Hori, Takaaki
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 521 - 527
[45] Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
Baskar, Murali Karthick
Watanabe, Shinji
Astudillo, Ramon
Hori, Takaaki
Burget, Lukas
Cernocky, Jan
INTERSPEECH 2019, 2019, : 3790 - 3794
[46] IMPROVING SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION TRAINING WITH ON-THE-FLY DATA AUGMENTATION
Nguyen, Thai-Son
Stuker, Sebastian
Niehues, Jan
Waibel, Alex
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7689 - 7693
[47] CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Li, Qiujia
Qiu, David
Zhang, Yu
Li, Bo
He, Yanzhang
Woodland, Philip C.
Cao, Liangliang
Strohman, Trevor
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6388 - 6392
[48] Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning
Mohan, Devang S. Ram
Lenain, Raphael
Foglianti, Lorenzo
Teh, Tian Huey
Staib, Marlene
Torresquintero, Alexandra
Gao, Jiameng
INTERSPEECH 2020, 2020, : 3186 - 3190
[49] Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Zhou, Shiyu
Dong, Linhao
Xu, Shuang
Xu, Bo
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 791 - 795
[50] A NEW SEQUENCE-TO-SEQUENCE TRANSFORMATION
CLARK, WD
GRAY, HL
SIAM REVIEW, 1969, 11 (04) : 648 - &

← 1 2 3 4 5 →