Speech Recognition using Long-Term Phase Information

被引:0
作者
Yamamoto, Kazumasa [1 ]
Sueyoshi, Eiichi [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
speech recognition; phase information; long-term analysis; group delay;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.
引用
收藏
页码:1189 / 1192
页数:4
相关论文
共 50 条
  • [11] Associative information for speech recognition using semantic attributes
    Sekiguchi, Y
    Kanbe, T
    Yamaguchi, T
    Suzuki, Y
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1997, 80 (02): : 63 - 77
  • [12] Depressive symptoms affect short- and long-term speech recognition outcome in cochlear implant users
    Heinze-Koehler, Katharina
    Lehmann, Effi Katharina
    Hoppe, Ulrich
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2021, 278 (02) : 345 - 351
  • [13] Deep Long Short-Term Memory Networks for Speech Recognition
    Chien, Jen-Tzung
    Misbullah, Alim
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [14] Mechanisms of long-term repetition priming in recognising speech in noise
    Gleason, Liam J.
    Francis, Wendy S.
    MEMORY, 2024, 32 (02) : 237 - 251
  • [15] Long-term outcomes on spatial hearing, speech recognition and receptive vocabulary after sequential bilateral cochlear implantation in children
    Sparreboom, Marloes
    Langereis, Margreet C.
    Snik, Ad F. M.
    Mylanus, Emmanuel A. M.
    RESEARCH IN DEVELOPMENTAL DISABILITIES, 2015, 36 : 328 - 337
  • [16] Long Short-Term Memory Networks for Noise Robust Speech Recognition
    Woellmer, Martin
    Sun, Yang
    Eyben, Florian
    Schuller, Bjoern
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2966 - 2969
  • [17] A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION
    Hsu, Wei-Ning
    Zhang, Yu
    Glass, James
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 467 - 473
  • [18] Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Liu, Zhilei
    Guan, Haotian
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 14 - 25
  • [19] An isolated word speech recognition using fusion of auditory and visual information
    Shintani, A
    Ogihara, A
    Doi, N
    Takamatsu, S
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1996, E79A (06) : 777 - 783
  • [20] SPEECH RECOGNITION USING HMM BASED ON FUSION OF VISUAL AND AUDITORY INFORMATION
    SHINTANI, A
    OGIHARA, A
    YAMAGUCHI, Y
    HAYASHI, Y
    FUKUNAGA, K
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1994, E77A (11) : 1875 - 1878