Time-varying sinusoidal demodulation for non-stationary modeling of speech

被引：1

作者：

Sharma, Neeraj Kumar ^{[1
]}

Sreenivas, Thippur V. ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Elect Commun Engn, Bangalore 560012, Karnataka, India

来源：

SPEECH COMMUNICATION | 2018年 / 105卷

关键词：

Speech modeling; Sinusoidal modeling; Speech analysis; Speech synthesis; Harmonic demodulation; Subband modeling; INSTANTANEOUS-FREQUENCY; SIGNAL DECOMPOSITION; ENVELOPE; REPRESENTATIONS;

D O I：

10.1016/j.specom.2018.10.008

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech signals contain a fairly rich time-evolving spectral content. Accurate analysis of this time-evolving spectrum is an open challenge in signal processing. Towards this, we visit time-varying sinusoidal modeling of speech and propose an alternate model estimation approach. The estimation operates on the whole signal without any short-time analysis. The approach proceeds by extracting the fundamental frequency sinusoid (FFS) from speech signal. The instantaneous amplitude (IA) of the FFS is used for voiced/unvoiced stream segregation. The voiced stream is then demodulated using a variant of in-phase and quadrature-phase demodulation carried at harmonics of the FFS. The result is a non-parametric time-varying sinusoidal representation, specifically, an additive mixture of quasi-harmonic sinusoids for voiced stream and a wideband mono-component sinusoid for unvoiced stream. The representation is evaluated for analysis-synthesis, and the bandwidth of IA and IF signals are found to be crucial in preserving the quality. Also, the obtained IA and IF signals are found to be carriers of perceived speech attributes, such as speaker characteristics and intelligibility. On comparing the proposed modeling framework with the existing approaches, which operate on short-time segments, improvement is found in simplicity of implementation, objective-scores, and computation time. The listening test scores suggest that the quality preserves naturalness but does not yet beat the state-of-the-art short-time analysis methods. In summary, the proposed representation lends itself for high resolution temporal analysis of non-stationary speech signals, and also allows quality preserving modification and synthesis.

引用

页码：77 / 91

页数：15