Feature Extraction Using Power-Law Adjusted Linear Prediction With Application to Speaker Recognition Under Severe Vocal Effort Mismatch

被引:23
作者
Saeidi, Rahim [1 ]
Alku, Paavo [1 ]
Baeckstroem, Tom [2 ]
机构
[1] Aalto Univ, Dept Signal Proc & Acoust, FI-00076 Aalto, Finland
[2] Int Audio Labs Erlangen, D-91058 Erlangen, Germany
基金
芬兰科学院;
关键词
Speaker recognition; linear prediction; power-law; vocal effort; shouting; mismatch; ACOUSTIC FEATURES; SPEECH; NOISE; VERIFICATION; IDENTIFICATION; COMPENSATION; DYNAMICS; MODEL;
D O I
10.1109/TASLP.2015.2493366
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Linear prediction is one of the most established techniques in signal estimation, and it is widely utilized in speech signal processing. It has been long understood that the nerve firing rate of human auditory system can be approximated by power law non-linearity, and this has been the motivation behind using perceptual linear prediction in extracting acoustic features in a variety of speech processing applications. In this paper, we revisit the application of power law non-linearity in speech spectrum estimation by compressing/expanding power spectrum in autocorrelation-based linear prediction. The development of so-called LP-alpha is motivated by a desire to obtain spectral features that present less mismatch than conventionally used spectrum estimation methods when speech of normal loudness is compared to speech under vocal effort. The effectiveness of the proposed approach is demonstrated in a speaker recognition task conducted under severe vocal effort mismatch comparing shouted versus normal speech mode.
引用
收藏
页码:42 / 53
页数:12
相关论文
共 66 条
  • [1] [Anonymous], P NIST SRE 2012 WORK
  • [2] [Anonymous], 1998, THESIS
  • [3] [Anonymous], P 2013 INT
  • [4] [Anonymous], P INT C AC SPEECH SI
  • [5] [Anonymous], 2011, INTERSPEECH
  • [6] [Anonymous], 2011, INTERSPEECH
  • [7] [Anonymous], P S SPEECH PROC BOMB
  • [8] [Anonymous], 2013, P INT
  • [9] [Anonymous], 2007, ISSPA
  • [10] [Anonymous], P INT 13