An improved voice activity detection algorithm for GSM adaptive multi-rate speech codec based on wavelet and support vector machine

被引:0
作者
Chen, Shi-Huang
Chang, Yaotsu
Truong, T. K.
机构
来源
New Trends in Applied Artificial Intelligence, Proceedings | 2007年 / 4570卷
关键词
GSM AMR; VAD; wavelet; support vector machine;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an improved voice activity detection (VAD) algorithm for controlling discontinuous transmission (DTX) of the GSM adaptive multi-rate (AMR) speech codec. First, based on the wavelet transform, the original IIR filter bank and the open-loop pitch detector are implemented via the wavelet filter bank and the wavelet-based pitch detection algorithm, respectively. The proposed wavelet filter bank divides the input speech signal into 9 frequency bands so that the signal level at each sub-band can be calculated. In addition, the background noise can be estimated in each sub-band by using the wavelet de-noising method. The wavelet filter bank is also derived to detect correlated complex signals like music. Then one can apply support vector machine (SVM) to train an optimized non-linear VAD decision rule involving the sub-band power, noise level, pitch period, tone flag, and complex signals warning flag of input speech signals. By the use of the trained SVM, the proposed VAD algorithm can produce more accurate detection results. Various experimental results carried out from the Aurora speech database show that the proposed algorithm gives considerable VAD performances superior to the AMR VAD Option I and comparable with the AMR VAD Option 2.
引用
收藏
页码:915 / 924
页数:10
相关论文
共 11 条
  • [1] Noise-robust pitch detection method using wavelet transform with aliasing compensation
    Chen, SH
    Wang, JF
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2002, 149 (06): : 327 - 334
  • [2] Adapting to unknown smoothness via wavelet shrinkage
    Donoho, DL
    Johnstone, IM
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (432) : 1200 - 1224
  • [3] Robust noise detection for speech detection and enhancement
    Garner, NR
    Barrett, PA
    Howard, DM
    Tyrrell, AM
    [J]. ELECTRONICS LETTERS, 1997, 33 (04) : 270 - 271
  • [4] Speech enhancement based on wavelet thresholding the multitaper spectrum
    Hu, Y
    Loizou, PC
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01): : 59 - 67
  • [5] Audio classification and categorization based on wavelets and support vector machine
    Lin, CC
    Chen, SH
    Truong, TK
    Chang, Y
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 644 - 651
  • [6] MALLAT S, 1980, IEEE T ACOUSTIC SPEE, V68, P2091
  • [7] A new Kullback-Leibler VAD for speech recognition in noise
    Ramírez, J
    Segura, JC
    Benítez, C
    de la Torre, A
    Rubio, AJ
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2004, 11 (02) : 266 - 269
  • [8] ESTIMATION OF THE MEAN OF A MULTIVARIATE NORMAL-DISTRIBUTION
    STEIN, CM
    [J]. ANNALS OF STATISTICS, 1981, 9 (06) : 1135 - 1151
  • [9] Vapnik V., 1998, STAT LEARNING THEORY, V1, P2
  • [10] 2000, AURORA 2 DATABASE