Synthesizing Suprasegmental Speech Information Using Hybrid of GA-ACO and Dynamic Neural Network

被引：0

作者：

Sheikhan, Mansour ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Elect Engn, South Tehran Branch, Tehran, Iran

来源：

2013 5TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT) | 2013年

关键词：

suprasegmental information; neural network; genetic algorithm; ant colony optimization; feature selection; GENERATION; RULES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In generating natural speech by machines, removing the suprasegmental information (such as stress, timing and pitch frequency) results in unpleasant speech. To provide this information for synthesizing natural speech in Farsi language, a dynamic neural network (DNN) is used in this study. The inputs of DNN are word-level and syllable-level features as part of speech tags, word length, and type of punctuation mark at the word-level, and type of vowel and consonants, and position indicator of syllable at the syllable-level. To reduce the number of inputs of DNN, hybrid of genetic algorithm (GA) and ant colony optimization (ACO) is used for feature selection. The output layer of DNN includes nine nodes which provide suprasegmental information at the syllable level including pitch contour, log-energy level, duration information and pause data. Simulation results show that suprasegmental information is predicted with low root mean square error by using this hybrid soft-computing model.

引用

页码：175 / 180

页数：6

共 51 条

[1] Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence [J].

Adell, Jordi ;

Escudero, David ;

Bonafonte, Antonio .

SPEECH COMMUNICATION, 2012, 54 (03) :459-476

[2]

Aguero P. D., 2005, P INT C SPEECH COMP, P297

[3] INTEGRATION OF RHYTHMIC AND SYNTACTIC CONSTRAINTS IN A MODEL OF GENERATION OF FRENCH PROSODY [J].

BAILLY, G .

SPEECH COMMUNICATION, 1989, 8 (02) :137-146

[4]

Bassi A., 2005, P INT C SPEECH COMP, P691

[5]

Buhmann J., 2002, P INT C SPOK LANG PR, P2089

[6]

Chen K, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P509

[7] An RNN-based prosodic information synthesizer for Mandarin text-to-speech [J].

Chen, SH ;

Hwang, SH ;

Wang, YR .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03) :226-239

[8]

CHEN SH, 1990, IEEE T COMMUN, V38, P1317

[9]

Childers D.G., 2000, Speech processing and synthesis toolboxes

[10]

Dong Yuan, 2010, Acta Automatica Sinica, V36, P1569, DOI 10.3724/SP.J.1004.2010.01569

← 1 2 3 4 5 6 →