SFNet: A Computationally Efficient Source Filter Model Based Neural Speech Synthesis

被引：6

作者：

Rao, Achuth M., V ^{[1
]}

Ghosh, Prasanta Kumar ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Elect Engn, Bangalore 560012, Karnataka, India

来源：

IEEE SIGNAL PROCESSING LETTERS | 2020年 / 27卷

关键词：

Neural vocoder; source-filter model; computational complexity; Mel-spectrum; LINEAR PREDICTION; ESTIMATOR;

D O I：

10.1109/LSP.2020.3005031

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, neural speech synthesizers have achieved a high-quality synthesis for text-to-speech applications, but a real-time synthesis is possible only in the devices which have high memory and allow large computational complexity. In this work, we reduce the complexity of a speech synthesizer by reformulating the source-filter model of speech where the excitation signal is modeled as a sum of two signals. The first signal contains an impulse train that is computed from the pitch sequence. The second signal is modeled as white noise passed through a filter bank with frequency dependent gains. The parameters of the reformulated source-filter model are predicted using a neural network, referred to as SFNet. The network parameters are learnt by training the network using l(1)-error between the log Mel-spectrum of the predicted waveform and that of the ground-truth waveform. We demonstrate that there is a significant reduction in the memory and computational complexity compared to the state-of-the-art speaker independent neural speech synthesizer without any loss of the naturalness of the synthesized speech.

引用

页码：1170 / 1174

页数：5

共 32 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

[Anonymous], LPCNET IMPLEMENTATIO

[3]

[Anonymous], 2018, PROC 35 INT C MACH L

[4]

[Anonymous], MCGILL U DATABASE VE

[5]

[Anonymous], METH SUBJ ASS SMALL

[6]

[Anonymous], 1988, Modern Spectral Estimation

[7]

Arik SÖ, 2017, ADV NEUR IN, V30

[8] A sawtooth waveform inspired pitch estimator for speech and music [J].

Camacho, Arturo ;

Harris, John G. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 124 (03) :1638-1652

[9]

Fant G., 1960, ACOUSTIC THEORY SPEE

[10] SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].

GRIFFIN, DW ;

LIM, JS .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243

← 1 2 3 4 →