Speech Recognition using Long-Term Phase Information

被引:0
作者
Yamamoto, Kazumasa [1 ]
Sueyoshi, Eiichi [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
speech recognition; phase information; long-term analysis; group delay;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.
引用
收藏
页码:1189 / 1192
页数:4
相关论文
共 50 条
  • [31] INFORMATION RETRIEVAL METHODS FOR AUTOMATIC SPEECH RECOGNITION
    Xiao, Xiaoqiang
    Droppo, Jasha
    Acero, Alex
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5550 - 5553
  • [32] Speech recognition for illiterate access to information and technology
    Plauche, Madelaine
    Nallasamy, Udhyakurnar
    Pal, Joyojeet
    Wooters, Chuck
    Ramachandran, Divya
    2006 International Conference on Information and Communication Technologies and Development, 2006, : 83 - 92
  • [33] POSITION INFORMATION FOR LANGUAGE MODELING IN SPEECH RECOGNITION
    Chiu, Hsuan-Sheng
    Chen, Guan-Yu
    Lee, Chun-Jen
    Chen, Berlin
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 101 - 104
  • [34] Information divergence criterion in speech signal recognition
    Bocharov, I
    Lukin, P
    FUNDAMENTA INFORMATICAE, 2005, 68 (04) : 303 - 313
  • [35] SPEECH AUGMENTATION USING WAVENET IN SPEECH RECOGNITION
    Wang, Jisung
    Kim, Sangki
    Lee, Yeha
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6770 - 6774
  • [36] Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information
    Sanchez, Jon
    Saratxaga, Ibon
    Hernaez, Inma
    Navas, Eva
    Erro, Daniel
    Raitio, Tuomo
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2015, 10 (04) : 810 - 820
  • [37] Development of Output Correction Methodology for Long Short Term Memory-Based Speech Recognition
    Arslan, Recep Sinan
    Barisci, Necaattin
    SUSTAINABILITY, 2019, 11 (15)
  • [38] Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Chng, Eng Siong
    Nakagawa, Seiichi
    SPEECH COMMUNICATION, 2022, 136 : 118 - 127
  • [39] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [40] Speech recognition using fractals
    Bohez, ELJ
    Senevirathne, TR
    PATTERN RECOGNITION, 2001, 34 (11) : 2227 - 2243