A flexible spectral modification method based on temporal decomposition and Gaussian mixture model

被引：7

作者：

Binh Phu Nguyen ^{[1
]}

Akagi, Masato ^{[1
]}

机构：

[1] Japan Adv Inst Sci & Technol, Sch Informat Sci, 1-1 Asahidai, Nomi 9231292, Japan

来源：

ACOUSTICAL SCIENCE AND TECHNOLOGY | 2009年 / 30卷 / 03期

关键词：

Spectral modification; Temporal decomposition; Gaussian mixture model; STRAIGHT;

D O I：

10.1250/ast.30.170

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Manipulating spectral structure often leads to degradation of speech quality, which is mainly due to insufficient smoothness of the modified spectra between frames, and ineffective spectral modification. This paper presents a new spectral modification method to improve the quality of modified speech. If frames are processed independently, discontinuous features may be generated. Therefore, a speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to model the spectral evolution effectively. Instead of modifying the speech spectra frame by frame, we only need to modify event targets and event functions. This feature leads to easy modification of the speech spectra, and the smoothness of modified speech is ensured by the shape of event functions. To improve spectral modification, we explore Gaussian mixture model parameters (spectral-GMM parameters) to model the spectral envelope of each event target, and develop a new algorithm for modifying spectral-GMM parameters in accordance with formant scaling factors. We first evaluate the effectiveness of our proposed method in spectra modeling, and then apply it to two areas which require different amounts of spectral modification, emotional speech synthesis and voice gender conversion. Experimental results show that the effectiveness of our proposed method is verified for spectra modeling and spectral modification.

引用

页码：170 / 179

页数：10

共 29 条

[21]

Paliwal K. K., 1995, P EUR C SPEECH COMM, P1029

[22] Efficient Vector Quantization of LPC Parameters at 24 Bits/Frame [J].

Paliwal, Kuldip K. ;

Atal, Bishnu S. .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (01) :3-14

[23]

Rix AW, 2001, INT CONF ACOUST SPEE, P749, DOI 10.1109/ICASSP.2001.941023

[24] AN ANALYSIS OF VARIANCE FOR PAIRED COMPARISONS [J].

SCHEFFE, H .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1952, 47 (259) :381-400

[25] Continuous probabilistic transform for voice conversion [J].

Stylianou, Y ;

Cappe, O ;

Moulines, E .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (02) :131-142

[26]

Turajlic E, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P724

[27]

Zolfaghari P, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P553

[28]

Zolfaghari P, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1229, DOI 10.1109/ICSLP.1996.607830

[29] Dynamic assignment of Gaussian components in modelling speech spectra [J].

Zolfaghari, Parham ;

Kato, Hiroko ;

Minami, Yasuhiro ;

Nakamura, Atsushi ;

Katagiri, Shigeru ;

Patterson, Roy .

JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2006, 45 (1-2) :7-19

← 1 2 3 →