A Spectral Variation Function for Variable Time-Scale Modification of Speech

被引：0

作者：

Kachare, Pramod H. ^{[1
,2
]}

Pandey, Prem C. ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect Engn, Mumbai, Maharashtra, India

[2] Ramrao Adik Inst Technol, Dept Elect & Telecom Engn, Navi Mumbai, India

来源：

2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2021年

关键词：

Spectral variation function; time-scale modification; voice conversion; TRANSFORMATION; NOISE;

D O I：

10.1109/NCC52529.2021.9530088

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spectral variation function is used to detect salient segments (segments with sharp spectral transitions). It is calculated from cosine of the angle between the averaged feature vectors of the adjacent segments. A modified version of this function is presented for variable time-scale modification of the speech signal. It uses the magnitude spectrum smoothed by auditory critical band filters and a small offset in the normalization for the angle cosine. Test results showed that the modified function detects spectral saliencies and does not have spurious peaks. It is applied for variable time-scale modification without altering the overall duration. Listening tests showed significantly better speech quality for processing using the modified function.

引用

页码：48 / 52

页数：5

共 36 条

[1]

[Anonymous], 2016, document ITU-T Recommendation P.800.2

[2] Speaker Transformation Algorithm using Segmental Codebooks (STASC) [J].

Arslan, LM .

SPEECH COMMUNICATION, 1999, 28 (03) :211-226

[3]

Brugnara F., 1992, P INT C SPOK LANG PR, V1, P627

[4]

Covell M, 1998, INT CONF ACOUST SPEE, P349, DOI 10.1109/ICASSP.1998.674439

[5]

Demol M., 2005, P 10 INT C SPEECH CO, P163

[6]

Dorran D, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P700

[7]

Esposito A, 2005, LECT NOTES ARTIF INT, V3445, P261

[8]

Flammia G., 1992, PROC INT C SPOKEN LA, P983

[9] ON THE ROLE OF SPECTRAL TRANSITION FOR SPEECH-PERCEPTION [J].

FURUI, S .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1986, 80 (04) :1016-1025

[10] Time-scale modification of audio signals using enhanced WSOLA with management of transients [J].

Grofit, Shahaf ;

Lavner, Yizhar .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :106-115

← 1 2 3 4 →