TEXT-TO-SPEECH SYSTEMS FOR FILIPINO USING UNIT SELECTION AND DEEP LEARNING

被引：0

作者：

Renovalles, Edsel Jedd ^{[1
]}

Lucas, Crisron Rudolf ^{[1
]}

de Leon, Franz ^{[1
]}

Aquino, Angelina ^{[1
]}

Jalandoni, Izza ^{[1
]}

机构：

[1] Univ Philippines Diliman, Elect & Elect Engn Inst, Quezon City, Philippines

来源：

2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年

关键词：

data augmentation; deep learning; text-to-speech; unit selection; voice conversion;

D O I：

10.1109/O-COCOSDA202152914.2021.9660431

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

There are several text-to-speech (TTS) systems developed and published for Philippine languages, particularly in Filipino and Cebuano. However, there is still a need to improve the performance of existing systems. Due to the limited amount of linguistic resources and lack of speech data available for Philippine languages, developing a reliable TTS system to support these languages becomes difficult. In this paper, we implement and evaluate the performance of two TTS systems for Filipino. We implemented two methods: unit selection using MaryTTS and deep learning approach using Tacotron-2. We tried applying modification on the F0 contour and duration of the unit selection system and used voice conversion to augment the training data of Tacotron-2. The unit selection system achieved a mean opinion score (MOS) of 3.05, however, the boundary-based F0 modification yielded perceivable distortions in the output and requires more enhancements to become more effective. On the other hand, the use of voice conversion to transform the original multi-speaker data into single-speaker data and producing more samples for training boosted the Tacotron-2 performance from an overall MOS of 1.51 to 2.01.

引用

页码：212 / 217

页数：6

共 15 条

[1]

[Anonymous], 2015, 2015 ANN IEEE INDIA

[2]

[Anonymous], 2016, 9 ISCA SPEECH SYNTH

[3]

Aquino Angelina, 2019, GRAPHEME TO PHONEME

[4]

Cabral Karl Cedric P., 2014, DEV TAG TEXT SPEECH

[5]

Caroro Roseclaremath, 2015, TEXT TO SPEECH USING, V10, P40209

[6]

Electrical and Electronics Engineering Institute, PHIL LANG DAT MOTH

[7]

Garcia Ailen B., 2015, EUROPEAN ACAD RES, P12997

[8]

Gonzaga John Christopher P., 2014, International Journal of Future Computer and Communication, V3, P271, DOI 10.7763/IJFCC.2014.V3.310

[9]

Gonzales Michael Gian V., 2019, COMMUNICATION MAY

[10]

Jalandoni I, 2019, TAGALOG TEXT TO SPEE

← 1 2 →