A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition

被引:0
作者
Morales-Cordovilla, Juan A. [1 ]
Cabanas-Molero, Pablo
Peinado, Antonio M. [1 ]
Sanchez, Victoria [1 ]
机构
[1] Univ Granada, Dept Teoria Senal Telemat & Comunicac, E-18071 Granada, Spain
来源
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES | 2012年 / 328卷
关键词
pitch extractor; pitch line; CASA; DTW; noise; robust speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a robust pitch extractor with application in Automatic Speech Recognition and based on selecting pitch lines of a tonegram (a representation of the different pitch energies at each frame time). First, the tonegram and its maximum energy regions are extracted and a Dynamic Time Warping algorithm finds the most energetic trajectories or pitch lines from these regions. A second stage estimates the tonegram of the most energetic lines by applying Computational Auditory Scene Analysis rules which reject and group octave-related lines. The mean pitch of the speaker is estimated and the final pitch is estimated by rejecting lines which are outside from the mean pitch. The proposed pitch extractor is evaluated in a novel way - by means of the word accuracy of a Missing Data recognizer on Aurora-2 database.
引用
收藏
页码:197 / 206
页数:10
相关论文
共 12 条
  • [1] Decoding speech in the presence of other sources
    Barker, JP
    Cooke, MP
    Ellis, DPW
    [J]. SPEECH COMMUNICATION, 2005, 45 (01) : 5 - 25
  • [2] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    [J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [3] YIN, a fundamental frequency estimator for speech and music
    de Cheveigné, A
    Kawahara, H
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 111 (04) : 1917 - 1930
  • [4] Gonzalez S., 2011, EUSIPCO
  • [5] Exploiting correlogram structure for robust speech recognition with multiple speech sources
    Ma, Ning
    Green, Phil
    Barker, Jon
    Coy, Andre
    [J]. SPEECH COMMUNICATION, 2007, 49 (12) : 874 - 891
  • [6] Morales-Cordovilla J.A., 2011, THESIS U GRANADA SPA
  • [7] Morales-Cordovilla JA, 2011, INT CONF ACOUST SPEE, P4808
  • [8] Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition
    Morales-Cordovilla, Juan A.
    Peinado, Antonio M.
    Sanchez, Victoria
    Gonzalez, Jose A.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 640 - 651
  • [9] Pearce D., 2000, P 6 INT C SPOK LANG, V4, P29
  • [10] Peinado A., 2006, SPEECH RECOGNITION D