Fast concatenative speech synthesis using pre-fused speech units based on the plural unit selection and fusion method

被引：1

作者：

Tamura, Masatsune ^{[1
]}

Mizutani, Tatsuya

Kagoshima, Takehiko

机构：

[1] Toshiba Co Ltd, Ctr Corp Res & Dev, Multimedia Lab, Kawasaki, Kanagawa 2128582, Japan

[2] Toshiba Co Ltd, Semicond Co, Ome 1988710, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2007年 / E90D卷 / 02期

关键词：

concatenative speech synthesis; unit selection; unit fusion; offline unit fusion; frequency-weighted VQ;

D O I：

10.1093/ietisy/e90-d.2.544

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We have previously developed a concatenative speech synthesizer based on the plural speech unit selection and fusion method that can synthesize stable and human-like speech. In this method, plural speech units for each speech segment are selected using a cost function and fused by averaging pitch-cycle waveforms. This method has a large computational cost, but some platforms require a speech synthesis system that can work within limited hardware resources. In this paper, we propose an offline unit fusion method that reduces the computational cost. In the proposed method, speech units are fused in advance to make a pre-fused speech unit database. At synthesis time, a speech unit for each segment is selected from the pre-fused speech unit database and the speech waveform is synthesized by applying prosodic modification and concatenation without the computationally expensive unit fusion process. We compared several algorithms for constructing the pre-fused speech unit database. From the subjective and objective evaluations, the effectiveness of the proposed method is confirmed by the results that the quality of synthetic speech of the offline unit fusion method with 100 MB database is close to that of the online unit fusion method with 93 MB JP database and is slightly lower to that of the 390 MB US database, while the computational time is reduced by 80%. We also show that the frequency-weighted VQ-based method is effective for construction of the pre-fused speech unit database.

引用

页码：544 / 553

页数：10

共 39 条

[31] HMM-based Unit Selection Using Frame Sized Speech Segments [J].

Ling, Zhen-Hua ;

Wang, Ren-Hua .

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, :2034-2037

[32] EXTRACTING UNIT EMBEDDINGS USING SEQUENCE-TO-SEQUENCE ACOUSTIC MODELS FOR UNIT SELECTION SPEECH SYNTHESIS [J].

Zhou, Xiao ;

Ling, Zhen-Hua ;

Dai, Li-Rong .

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :7659-7663

[33] A HMM Based Speech Synthesis Method Using Articulatory Feature [J].

Li, Yong ;

Yin, Qing .

PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, :185-189

[34] DNN-Based Unit Selection Using Frame-Sized Speech Segments [J].

Zhou, Zhi-Ping ;

Ling, Zhen-Hua .

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

[35] A classifier-based target cost for unit selection speech synthesis trained on perceptual data [J].

Strom, Volker ;

King, Simon .

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, :150-153

[36] HMM-based unit selection speech synthesis using log likelihood ratios derived from perceptual data [J].

Xia, Xian-Jun ;

Ling, Zhen-Hua ;

Jiang, Yuan ;

Dai, Li-Rong .

SPEECH COMMUNICATION, 2014, 63-64 :27-37

[37] A Pre-Selection of Candidate Units Using Accentual Characteristic In a Unit Selection Based Japanese TTS System [J].

Na, Deok-Su ;

Min, So-Yeon ;

Lee, Kwang-Hyoung ;

Lee, Jong-Seok ;

Bae, Myung-Jin .

JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2007, 26 (04) :159-165

[38] Czech Expressive Speech Synthesis in Limited Domain Comparison of Unit Selection and HMM-Based Approaches [J].

Gruber, Martin ;

Hanzlicek, Zdenek .

TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 :656-664

[39] Unit selection based speech synthesis for converting short text message into voice message in mobile phones [J].

Bharthi, B. ;

Kavitha, S. ;

Kotwal, Nekshan Percy ;

Parasaram, Nivedita ;

Piriyanga, J. .

2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2017,

← 1 2 3 4 →