Speech Recognition using Long-Term Phase Information

被引:0
作者
Yamamoto, Kazumasa [1 ]
Sueyoshi, Eiichi [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
speech recognition; phase information; long-term analysis; group delay;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.
引用
收藏
页码:1189 / 1192
页数:4
相关论文
共 50 条
  • [21] Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition
    Ueno, Sei
    Lee, Akinobu
    Kawahara, Tatsuya
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3924 - 3933
  • [22] Using speech recognition and intelligent search tools to enhance information accessibility
    Bain, Keith
    Hines, Jason
    Lingras, Pawan
    Qin, Yumei
    UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT 3, PROCEEDINGS, 2007, : 214 - +
  • [23] Analysis and Recognition of NAM Speech Using HMM Distances and Visual Information
    Heracleous, Panikos
    Tran, Viet-Anh
    Nagai, Takayuki
    Shikano, Kiyohiro
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1528 - 1538
  • [24] Approximated mutual information training for speech recognition using myoelectric signals
    Guo, Hua J.
    Chan, A. D. C.
    2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 96 - 99
  • [25] Spoofing Speech Detection Using Modified Relative Phase Information
    Wang, Longbiao
    Nakagawa, Seiichi
    Zhang, Zhaofeng
    Yoshida, Yohei
    Kawakami, Yuta
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (04) : 660 - 670
  • [26] Error Correction Using Long Context Match for Smartphone Speech Recognition
    Liang, Yuan
    Iwano, Koji
    Shinoda, Koichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (11): : 1932 - 1942
  • [27] Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
    Oruh, Jane
    Viriri, Serestina
    Adegun, Adekanmi
    IEEE ACCESS, 2022, 10 : 30069 - 30079
  • [28] LATTICE RESCORING STRATEGIES FOR LONG SHORT TERM MEMORY LANGUAGE MODELS IN SPEECH RECOGNITION
    Kumar, Shankar
    Nirschl, Michael
    Holtmann-Rice, Daniel
    Liao, Hank
    Suresh, Ananda Theertha
    Yu, Felix
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 165 - 172
  • [29] Post-error Correction in Automatic Speech Recognition Using Discourse Information
    Kang, Sangwoo
    Kim, Ji-Hwan
    Seo, Jungyun
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 53 - 56
  • [30] On the importance of phase in human speech recognition
    Shi, Guangji
    Shanechi, Maryam Modir
    Aarabi, Parham
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1867 - 1874