Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models

被引:2
|
作者
Levkovitch, Alon [1 ]
Nachmani, Eliya [1 ,2 ]
Wolf, Lior [1 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
[2] Facebook AI Res, Tel Aviv, Israel
来源
INTERSPEECH 2022 | 2022年
基金
欧洲研究理事会;
关键词
D O I
10.21437/Interspeech.2022-10045
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training. The method requires a short (similar to 3 seconds) sample from the target person, and generation is steered at inference time, without any training steps. At the heart of the method lies a sampling process that combines the estimation of the denoising model with a low-pass version of the new speaker's sample. The objective and subjective evaluations show that our sampling method can generate a voice similar to that of the target speaker in terms of frequency, with an accuracy comparable to state-of-the-art methods, and without training.
引用
收藏
页码:2983 / 2987
页数:5
相关论文
共 50 条
  • [41] AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Yang, Xuesong
    Hasegawa-Johnson, Mark
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [42] A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning
    Rahman, Shafin
    Khan, Salman
    Porikli, Fatih
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5652 - 5667
  • [43] Zero-shot denoising of microscopy images recorded at high-resolution limits
    Salwig, Sebastian
    Drefs, Jakob
    Luecke, Joerg
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (06)
  • [44] Zero-shot contrast enhancement and denoising network for low-light images
    Yahong Wu
    Feng Liu
    Multimedia Tools and Applications, 2024, 83 : 4037 - 4064
  • [45] FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
    Chen, Zhekai
    Wang, Wen
    Yang, Zhen
    Yuan, Zeqing
    Chen, Hao
    Shen, Chunhua
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 70 - 87
  • [46] A two-stage denoising framework for zero-shot learning with noisy labels
    Tang, Long
    Zhao, Pan
    Pan, Zhigeng
    Duan, Xingxing
    Pardalos, Panos M.
    INFORMATION SCIENCES, 2024, 654
  • [47] ZERO-SHOT HYPERSPECTRAL IMAGE DENOISING WITH SELF-COMPLETION WITH PATTERNED MASKS
    Itasaka, Tatsuki
    Okuda, Masahiro
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1340 - 1344
  • [48] Zero-shot contrast enhancement and denoising network for low-light images
    Wu, Yahong
    Liu, Feng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4037 - 4064
  • [49] SCALING NVIDIA'S MULTI-SPEAKER MULTI-LINGUAL TTS SYSTEMS WITH ZERO-SHOT TTS TO INDIC LANGUAGES
    Arora, Akshit
    Badlani, Rohan
    Kim, Sungwon
    Valle, Rafael
    Catanzaro, Bryan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 115 - 116
  • [50] Large Language Models are Zero-Shot Rankers for Recommender Systems
    Hou, Yupeng
    Zhang, Junjie
    Lin, Zihan
    Lu, Hongyu
    Xie, Ruobing
    McAuley, Julian
    Zhao, Wayne Xin
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 364 - 381