Speech Recognition using Long-Term Phase Information

被引:0
作者
Yamamoto, Kazumasa [1 ]
Sueyoshi, Eiichi [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
speech recognition; phase information; long-term analysis; group delay;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.
引用
收藏
页码:1189 / 1192
页数:4
相关论文
共 50 条
  • [1] Efficient voice activity detection algorithms using long-term speech information
    Ramírez, J
    Segura, JC
    Benítez, C
    de la Torre, A
    Rubio, A
    SPEECH COMMUNICATION, 2004, 42 (3-4) : 271 - 287
  • [2] SPEAKER CLUSTERING USING VECTOR REPRESENTATION WITH LONG-TERM FEATURE FOR LECTURE SPEECH RECOGNITION
    Huang, Chien-Lin
    Hori, Chiori
    Kashioka, Hideki
    Ma, Bin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3532 - 3536
  • [3] Speech-recognition performance after long-term hearing aid use
    Shanks, Janet E.
    Wilson, Richard H.
    Stelmachowicz, Patricia
    Bratt, Gene W.
    Williams, David W.
    JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2007, 18 (04) : 292 - 303
  • [4] Emotion Recognition From Speech and Text using Long Short-Term Memory
    Venkateswarlu, Sonagiri China
    Jeevakala, Siva Ramakrishna
    Kumar, Naluguru Udaya
    Munaswamy, Pidugu
    Pendyala, Dhanalaxmi
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2023, 13 (04) : 11166 - 11169
  • [5] Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Zhang, Linjuan
    Guan, Haotian
    Li, Xiangang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1611 - 1615
  • [6] Effects of long-term training on aided speech-recognition performance in noise in older adults
    Burk, Matthew H.
    Humes, Larry E.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2008, 51 (03): : 759 - 771
  • [7] The effect of long-term deafness on speech recognition in postlingually deafened adult CLARION® cochlear implant users
    Geier, L
    Barker, M
    Fisher, L
    Opie, J
    ANNALS OF OTOLOGY RHINOLOGY AND LARYNGOLOGY, 1999, 108 (04) : 80 - 83
  • [8] Depressive symptoms affect short- and long-term speech recognition outcome in cochlear implant users
    Katharina Heinze-Köhler
    Effi Katharina Lehmann
    Ulrich Hoppe
    European Archives of Oto-Rhino-Laryngology, 2021, 278 : 345 - 351
  • [9] A Speech Recognition Method Using Long Short-Term Memory Network in Low Resources
    Shu F.
    Qu D.
    Zhang W.
    Zhou L.
    Guo W.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (10): : 120 - 127
  • [10] Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1086 - 1090