机构:
Pindrop, Atlanta, GA 30308 USAPindrop, Atlanta, GA 30308 USA
Sivaraman, Ganesh
[1
]
Nagarsheth, Parav
论文数: 0引用数: 0
h-index: 0
机构:
Pindrop, Atlanta, GA 30308 USAPindrop, Atlanta, GA 30308 USA
Nagarsheth, Parav
[1
]
Khoury, Elie
论文数: 0引用数: 0
h-index: 0
机构:
Pindrop, Atlanta, GA 30308 USAPindrop, Atlanta, GA 30308 USA
Khoury, Elie
[1
]
机构:
[1] Pindrop, Atlanta, GA 30308 USA
来源:
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES
|
2018年
Speech synthesis has wide range of applications in modern artificial intelligence technologies. Most state-of-the-art speech synthesis systems usually require high quality recordings of large amounts of speech data of the target speaker. We focus on low-budget speech synthesis. Our software deals with methods to perform statistical parametric speech synthesis using unlabeled and mixed quality speech data sourced from the interne. An average voice model trained using DNN is adapted to a target speaker using different speaker adaptation strategies. Preprocessing methods like speech enhancement, diarization and segmentation are applied to the sourced data. Utterance selection based on Mean cepstral distortion and forced alignment confidence are applied to prune the noisy and mis-aligned data. The mixed quality data thus pre-processed is then used to adapt the average voice model and duration models to the target speaker. The software to be demonstrated automates the whole procedure from preprocessing to synthesis. The software will be demonstrated by performing live synthesis using audio sourced from Youtube.