Speech Recognition using Long-Term Phase Information

被引：0

作者：

Yamamoto, Kazumasa ^{[1
]}

Sueyoshi, Eiichi ^{[1
]}

Nakagawa, Seiichi ^{[1
]}

机构：

[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

speech recognition; phase information; long-term analysis; group delay;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.

引用

页码：1189 / 1192

页数：4

共 50 条

[11] Associative information for speech recognition using semantic attributes
Sekiguchi, Y
Kanbe, T
Yamaguchi, T
Suzuki, Y
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1997, 80 (02): : 63 - 77
[12] Depressive symptoms affect short- and long-term speech recognition outcome in cochlear implant users
Heinze-Koehler, Katharina
Lehmann, Effi Katharina
Hoppe, Ulrich
EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2021, 278 (02) : 345 - 351
[13] Deep Long Short-Term Memory Networks for Speech Recognition
Chien, Jen-Tzung
Misbullah, Alim
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[14] Mechanisms of long-term repetition priming in recognising speech in noise
Gleason, Liam J.
Francis, Wendy S.
MEMORY, 2024, 32 (02) : 237 - 251
[15] Long-term outcomes on spatial hearing, speech recognition and receptive vocabulary after sequential bilateral cochlear implantation in children
Sparreboom, Marloes
Langereis, Margreet C.
Snik, Ad F. M.
Mylanus, Emmanuel A. M.
RESEARCH IN DEVELOPMENTAL DISABILITIES, 2015, 36 : 328 - 337
[16] Long Short-Term Memory Networks for Noise Robust Speech Recognition
Woellmer, Martin
Sun, Yang
Eyben, Florian
Schuller, Bjoern
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2966 - 2969
[17] A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION
Hsu, Wei-Ning
Zhang, Yu
Glass, James
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 467 - 473
[18] Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
Guo, Lili
Wang, Longbiao
Dang, Jianwu
Liu, Zhilei
Guan, Haotian
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 14 - 25
[19] An isolated word speech recognition using fusion of auditory and visual information
Shintani, A
Ogihara, A
Doi, N
Takamatsu, S
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1996, E79A (06) : 777 - 783
[20] SPEECH RECOGNITION USING HMM BASED ON FUSION OF VISUAL AND AUDITORY INFORMATION
SHINTANI, A
OGIHARA, A
YAMAGUCHI, Y
HAYASHI, Y
FUKUNAGA, K
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1994, E77A (11) : 1875 - 1878

← 1 2 3 4 5 →