ON THE INTERPLAY BETWEEN SPARSITY, NATURALNESS, INTELLIGIBILITY, AND PROSODY IN SPEECH SYNTHESIS

被引:0
作者
Lai, Cheng-I Jeff [1 ,2 ]
Cooper, Erica [3 ]
Zhang, Yang [2 ]
Chang, Shiyu [2 ]
Qian, Kaizhi [2 ]
Liao, Yi-Lun [1 ]
Chuang, Yung-Sung [1 ]
Liu, Alexander H. [1 ]
Yamagishi, Junichi [3 ]
Cox, David [2 ]
Glass, James [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] MIT, IBM Watson AI Lab, Cambridge, MA 02139 USA
[3] Natl Inst Informat, Tokyo, Japan
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
text-to-speech; vocoder; speech synthesis; pruning; efficiency;
D O I
10.1109/ICASSP43922.2022.9747728
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serves as a starting point to explore pruning both spectrogram prediction networks and vocoders. We thoroughly investigate the tradeoffs between sparsity and its subsequent effects on synthetic speech. Additionally, we explore several aspects of TTS pruning: amount of finetuning data versus sparsity, TTS-Augmentation to utilize unspoken text, and combining knowledge distillation and pruning. Our findings suggest that not only are end-to-end TTS models highly prunable, but also, perhaps surprisingly, pruned TTS models can produce synthetic speech with equal or higher naturalness and intelligibility, with similar prosody. All of our experiments are conducted on publicly available models, and findings in this work are backed by large-scale subjective tests and objective measures. Code and 200 pruned models are made available to facilitate future research on efficiency in TTS1.
引用
收藏
页码:8447 / 8451
页数:5
相关论文
共 37 条
  • [1] [Anonymous], 2015, P IEEE C COMPUTER VI, DOI [DOI 10.1109/CVPR.2015.7298801, 10.1109/CVPR.2015.7298801]
  • [2] Baevski Alexei, 2020, Advances in neural information processing systems
  • [3] Blalock Davis, 2020, C MACH LEARN SYST
  • [4] Chen N., 2021, ICLR
  • [5] Chen Nanxin, 2021, INTERSPEECH
  • [6] Cheng-I Jeff, 2021, NEURIPS
  • [7] Engel J., 2020, P ICLR, P1
  • [8] Five-repetition sit-to-Stand test among patients post-stroke and healthy-matched controls: the use of different chair types and number of trials
    Franco, Juliane
    Quintino, Ludmylla Ferreira
    Faria, Christina D. C. M.
    [J]. PHYSIOTHERAPY THEORY AND PRACTICE, 2021, 37 (12) : 1419 - 1428
  • [9] Gale Trevor, 2019, ARXIV190209574
  • [10] Hayes Ben, 2021, ISMIR