On the use of nearest feature line for speaker identification

被引：22

作者：

Chen, K ^{[1
]}

Wu, TY

Zhang, HJ

机构：

[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England

[2] Peking Univ, Ctr Informat Sci, Natl Lab Machine Percept, Beijing 100871, Peoples R China

[3] Microsoft Res Asia, Sigma Ctr, Beijing 100080, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2002年 / 23卷 / 14期

基金：

中国国家自然科学基金;

关键词：

nearest feature line; speaker identification; dynamic time warping; vector quantization; nearest neighboring measure;

D O I：

10.1016/S0167-8655(02)00147-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As a new pattern classification method, nearest feature line (NFL) provides an effective way to tackle the sort of pattern recognition problems where only limited data are available for training. In this paper, we explore the use of NFL for speaker identification in terms of limited data and examine how the NFL performs in such a vexing problem of various mismatches between training and test. In order to speed up NFL in decision-making, we propose an alternative method for similarity measure. We have applied the improved NFL to speaker identification of different operating modes. Its text-dependent performance is better than the dynamic time warping (DTW) on the Ti46 corpus, while its computational load is much lower than that of DTW. Moreover, we propose an utterance partitioning strategy used in the NFL for better performance. For the text-independent mode, we employ the NFL to be a new similarity measure in vector quantization (VQ), which causes the VQ to perform better on the KING corpus. Some computational issues on the NFL are also discussed in this paper. (C) 2002 Elsevier Science B.V. All rights reserved.

引用

页码：1735 / 1746

页数：12

共 14 条

[1]

[Anonymous], SPOKEN LANGUAGE PROC

[2] New LP-Derived Features for Speaker Identification [J].

Assaleh, Khaled T. ;

Mammone, Richard J. .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :630-638

[3] A modified HME architecture for text-dependent speaker identification [J].

Chen, K ;

Xie, DH ;

Chi, HS .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (05) :1309-1313

[4] The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective [J].

Doddington, GR ;

Przybocki, MA ;

Martin, AF ;

Reynolds, DA .

SPEECH COMMUNICATION, 2000, 31 (2-3) :225-254

[5] CEPSTRAL ANALYSIS TECHNIQUE FOR AUTOMATIC SPEAKER VERIFICATION [J].

FURUI, S .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (02) :254-272

[6]

Li SZ, 2000, IEEE T PATTERN ANAL, V22, P1335, DOI 10.1109/34.888719

[7] Face recognition based on nearest linear combinations [J].

Li, SZ .

1998 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1998, :839-844

[8] Content-based audio classification and retrieval using the nearest feature line method [J].

Li, SZ .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (05) :619-625

[9] ALGORITHM FOR VECTOR QUANTIZER DESIGN [J].

LINDE, Y ;

BUZO, A ;

GRAY, RM .

IEEE TRANSACTIONS ON COMMUNICATIONS, 1980, 28 (01) :84-95

[10]

Nolan F., 1983, The Phonetic Bases of Speaker Recognition

← 1 2 →