Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition

被引:10
作者
Chougule, Sharada V. [1 ]
Chavan, Mahesh S. [2 ]
机构
[1] Finolex Acad Management & Technol, Ratnagiri, Maharashtra, India
[2] KITs Coll Engn, Kolhapur, Maharashtra, India
来源
SECOND INTERNATIONAL SYMPOSIUM ON COMPUTER VISION AND THE INTERNET (VISIONNET'15) | 2015年 / 58卷
关键词
MFCCs (Mel Frequency Cepstral Coefficients); LPCCs (Linear Predictive Cepstral Coefficients; NDSF (Normalized Dynamic Spectral Features);
D O I
10.1016/j.procs.2015.08.021
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The widespread use of automatic speaker recognition technology in real world applications demands for robustness against various realistic conditions. In this paper, a robust spectral feature set, called NDSF (Normalized Dynamic Spectral Features) is proposed for automatic speaker recognition in mismatch condition. Magnitude spectral subtraction is performed on spectral features for compensation against additive noise. A spectral domain modification is further performed using time-difference approach followed by Gaussianization Non-linearity. Histogram normalization is applied to these dynamic spectral features, to compensate the effect of channel mismatch and some non-linear effects introduced due to handset transducers. Feature extraction using proposed features is carried out for a text independent automatic speaker recognition (identification) system. The performance of proposed feature set is compared with conventional cepstral features like (mel-frequency cepstral coefficients and linear prediction cepstral coefficients), for acoustic mismatch condition caused by use of different sensors. Studies are performed on two databases: A multi-variability speaker recognition (MVSR) developed by IIT-Guwahati and Multi-speaker continuous (Hindi) speech database (By Department of Information Technology, Government of India). From experimental analysis, it is observed that, spectral domain dynamic features enhance the robustness by reducing additive noise and channel effects caused by sensor mismatch. The proposed NDSF features are found to be more robust than cepstral features for both datasets (C) 2015 The Authors. Published by Elsevier B.V. This is all open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:272 / 279
页数:8
相关论文
共 22 条
[1]  
Adami Andre G, 2003, ICASSP 2003
[2]  
Biagetti G., 2015, P INT C PATT REC APP, P178
[3]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[4]   Channel Robust MFCCs for Continuous Speech Speaker Recognition [J].
Chougule, Sharada Vikram ;
Chavan, Mahesh S. .
ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 :557-568
[5]  
Furui S., 1986, P ICASSP
[6]   Robust Feature Extraction Using Modulation Filtering of Autoregressive Models [J].
Ganapathy, Sriram ;
Mallidi, Sri Harish ;
Hermansky, Hynek .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) :1285-1295
[7]   On the use of complementary spectral features for speaker recognition [J].
Hosseinzadeh, Danoush ;
Krishnan, Sridhar .
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2008, 2008 (1)
[8]  
Kinnunen Tomi, 2009, SPEECH COMMUNICATION
[9]  
Kumar K, 2011, INT CONF ACOUST SPEE, P4784
[10]   ALGORITHM FOR VECTOR QUANTIZER DESIGN [J].
LINDE, Y ;
BUZO, A ;
GRAY, RM .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1980, 28 (01) :84-95