Implementation of sequential real-time waveform generator for high-quality vocoder

被引：0

作者：

Morise, Masanori ^{[1
,2
]}

机构：

[1] Meiji Univ, Sch Interdisciplinary Math Sci, Tokyo, Japan

[2] JST, PRESTO, Saitama, Japan

来源：

2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2020年

关键词：

SPEECH; ESTIMATOR; STRAIGHT; SYSTEM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We describe an implementation of real-time waveform generation from vocoded speech parameters. High-quality vocoders such as STRAIGHT and WORLD have been used for voice conversion and statistical parametric speech synthesis. The current implementation of such vocoders has a function for generating the whole waveform from the speech parameters in all frames at one time. To sequentially generate a short-period waveform, implementations such as realtime STRAIGHT have been proposed. However, the generated speech waveform is inferior in sound quality to that of the original vocoder. To achieve sequential real-time waveform generation, a struct named WorldSynthesizer (WS struct) and six functions were implemented. The implementation is based on the WORLD vocoder, and it can generate the completely same waveform as the original except for the several points such as random seed used for generating the white noise. We therefore evaluated its processing speed by using the real time factor (RTF). The results showed that the processing speed of the proposed implementation decreased by 14.5% compared with the original WORLD. On the other hand, the RTF of the proposed implementation calculated from female speech was below 0.1, which suggests that the implementation is able to carry out real-time synthesis.

引用

页码：821 / 825

页数：5

共 28 条

[1]

Agiomyrgiannakis Y, 2015, INT CONF ACOUST SPEE, P4230, DOI 10.1109/ICASSP.2015.7178768

[2] G1ottDNN-A full -band glottal vocoder for statistical parametric speech synthesis [J].

Airaksinen, Manu ;

Bollepalli, Bajibabu ;

Juvela, Lauri ;

Wu, Zhizheng ;

King, Simon ;

Alku, Paavo .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2473-2477

[3]

[Anonymous], 2005, P INT 2005 LISB PORT

[4] Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation [J].

Banno, Hideki ;

Hata, Hiroaki ;

Morise, Masanori ;

Takahashi, Toru ;

Irino, Toshio ;

Kawahara, Hideki .

ACOUSTICAL SCIENCE AND TECHNOLOGY, 2007, 28 (03) :140-146

[5]

Blaauw M., 2018, APPL SCI, V7, P23

[6] A sawtooth waveform inspired pitch estimator for speech and music [J].

Camacho, Arturo ;

Harris, John G. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (03) :1638-1652

[7] YIN, a fundamental frequency estimator for speech and music [J].

de Cheveigné, A ;

Kawahara, H .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) :1917-1930

[8] A Log Domain Pulse Model for Parametric Speech Synthesis [J].

Degottex, Gilles ;

Lanchantin, Pierre ;

Gales, Mark .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) :57-70

[9] A uniform phase representation for the harmonic model in speech synthesis applications [J].

Degottex, Gilles ;

Erro, Daniel .

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, :1-16

[10] Remaking speech [J].

Dudley, H .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1939, 11 (02) :169-177

← 1 2 3 →