Effective Data Augmentation Methods for Neural Text-to-Speech Systems

被引:0
作者
Oh, Suhyeon [1 ]
Kwon, Ohsung [1 ]
Hwang, Min-Jae [1 ]
Kim, Jae-Min [1 ]
Song, Eunwoo [1 ]
机构
[1] NAVER Corp, Seongnam, South Korea
来源
2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC) | 2022年
关键词
speech synthesis; self-augmentation; ranking support vector machine;
D O I
10.1109/ICEIC54506.2022.9748515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes an effective self-augmentation method for improving the quality of neural text-to-speech (TTS) systems. As synthetic speech quality has been greatly improved, creating a neural TTS system using synthetic corpora is now possible. However, whether increasing the amount of synthetic data is always beneficial for improving training efficiency has not been verified. Our aim in this study is to selectively choose synthetic data whose characteristics are close to those of natural speech. Specifically, we adopt a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the synthetic and recorded corpora as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar with the recorded data. As training data can be selectively chosen from large-scale synthetic corpora, the performance of the TTS model re-trained by those data is significantly improved. Subjective evaluation results verify that the proposed TTS model performs much better than the original model trained with recorded data alone and the similarly configured system re-trained with all the synthetic data without any selection method.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
    Sreelekshmi, K. S.
    Gopinath, Deepa P.
    [J]. 2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127
  • [32] Diphone Spanish Text-to-Speech Synthesizer
    Rybarova, Renata
    del Corral, Gonzalo
    Rozinaj, Gregor
    [J]. 2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 121 - 124
  • [33] Dealing with prosody in a text-to-speech system
    Goldsmith J.
    [J]. International Journal of Speech Technology, 1999, 3 (1) : 51 - 63
  • [34] Comparison of the ITU-T P.85 Standard to Other Methods for the Evaluation of Text-to-Speech Systems
    Sityaev, Dmitry
    Knill, Katherine
    Burrows, Tina
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1077 - 1080
  • [35] A Rule-Based Concatenative Approach to Speech Synthesis in Indian Language Text-to-Speech Systems
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    [J]. INTELLIGENT COMPUTING, COMMUNICATION AND DEVICES, 2015, 309 : 523 - 531
  • [36] Spectral Smoothening Based Waveform Concatenation Technique for Speech Quality Enhancement in Text-to-Speech Systems
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    [J]. ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 425 - 432
  • [37] REPETITION AND RE-START STRATEGIES FOR PROSODY IN TEXT-TO-SPEECH CONVERSION SYSTEMS
    LAVER, J
    [J]. SPEECH COMMUNICATION, 1993, 13 (1-2) : 75 - 85
  • [38] BOOTSTRAPPING TEXT-TO-SPEECH FOR SPEECH PROCESSING IN LANGUAGES WITHOUT AN ORTHOGRAPHY
    Sitaram, Sunayana
    Palkar, Sukhada
    Chen, Yun-Nung
    Parlikar, Alok
    Black, Alan W.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7992 - 7996
  • [39] Pause Insertion Based on a Morphosyntactic Parser for Brazilian Portuguese Text-to-Speech Systems
    Seara, Izabel C.
    Kafka, Sandra G.
    Seara, Rui, Jr.
    Klein, Simone
    Pacheco, Fernando S.
    Seara, Rui
    [J]. PROCEEDINGS OF THE IEEE INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM, VOLS 1 AND 2, 2006, : 718 - 722
  • [40] Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0
    Corkey, Niamh
    O'Mahony, Johannah
    King, Simon
    [J]. INTERSPEECH 2023, 2023, : 2014 - 2015