Singing voice identification using spectral envelope estimation

被引：26

作者：

Bartsch, MA ^{[1
]}

Wakefield, GH ^{[1
]}

机构：

[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2004年 / 12卷 / 02期

基金：

美国国家科学基金会;

关键词：

music information retrieval; singer identification; spectral analysis; vocal tract transfer function;

D O I：

10.1109/TSA.2003.822637

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we. present a spectrum-based system for singer identification that operates for the ideal case in which audio samples contain only the singer's voice. Our method begins with the computation of a robust estimate of the spectral envelope called the composite transfer function (CTF). The CTF Is derived from the instantaneous amplitude and frequency of the sinusoidal partials which make up the vocal signal. Unlike traditional source-filter theory [1], the CTF does not explicitly separate the spectral characteristics of the vocal source and the vocal tract filter. The principal components of the CTFs are used as features for a quadratic classifier to identify singers. The approach is validated on a database containing samples from twelve classically trained singers. In cross validation experiments, test set accuracies of approximately 95% are found for a baseline case. The classifier's performance is not degraded when different vowels are included in classifier training and evaluation. Restricting the frequency range of the CTFs and using a test set containing samples extracted from solo performances of Italian arias-reduces the test set accuracy to 70-80%.

引用

页码：100 / 109

页数：10

共 37 条

[1]

ADAMS NH, 2003, 340 U MICH DEP EECS

[2]

BERENZWEIG AL, 2002, AES 22 INT C ESP FIN

[3]

Bishop C. M., 1996, Neural networks for pattern recognition

[4]

Boersma P., 1993, P I PHONETIC SCI, V17, P97, DOI DOI 10.1371/JOURNAL.PONE.0069107

[5] Speaker recognition: A tutorial [J].

Campbell, JP .

PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462

[6] Long-term-average spectrum characteristics of country singers during speaking and singing [J].

Cleveland, TE ;

Sundberg, J ;

Stone, RE .

JOURNAL OF VOICE, 2001, 15 (01) :54-60

[7]

Cohen L., 1995, TIME FREQUENCY ANAL

[8] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[9]

DENNIS I, 1983, NUMER METH UNCON OPT

[10] Discrimination functions: Can they be used to classify singing voices? [J].

Erickson, ML ;

Perry, S ;

Handel, S .

JOURNAL OF VOICE, 2001, 15 (04) :492-502

← 1 2 3 4 →