Speech Recognition using Long-Term Phase Information

被引：0

作者：

Yamamoto, Kazumasa ^{[1
]}

Sueyoshi, Eiichi ^{[1
]}

Nakagawa, Seiichi ^{[1
]}

机构：

[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

speech recognition; phase information; long-term analysis; group delay;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.

引用

页码：1189 / 1192

页数：4

共 50 条

[21] Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition
Ueno, Sei
Lee, Akinobu
Kawahara, Tatsuya
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3924 - 3933
[22] Using speech recognition and intelligent search tools to enhance information accessibility
Bain, Keith
Hines, Jason
Lingras, Pawan
Qin, Yumei
UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT 3, PROCEEDINGS, 2007, : 214 - +
[23] Analysis and Recognition of NAM Speech Using HMM Distances and Visual Information
Heracleous, Panikos
Tran, Viet-Anh
Nagai, Takayuki
Shikano, Kiyohiro
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1528 - 1538
[24] Approximated mutual information training for speech recognition using myoelectric signals
Guo, Hua J.
Chan, A. D. C.
2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 96 - 99
[25] Spoofing Speech Detection Using Modified Relative Phase Information
Wang, Longbiao
Nakagawa, Seiichi
Zhang, Zhaofeng
Yoshida, Yohei
Kawakami, Yuta
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (04) : 660 - 670
[26] Error Correction Using Long Context Match for Smartphone Speech Recognition
Liang, Yuan
Iwano, Koji
Shinoda, Koichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (11): : 1932 - 1942
[27] Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
Oruh, Jane
Viriri, Serestina
Adegun, Adekanmi
IEEE ACCESS, 2022, 10 : 30069 - 30079
[28] LATTICE RESCORING STRATEGIES FOR LONG SHORT TERM MEMORY LANGUAGE MODELS IN SPEECH RECOGNITION
Kumar, Shankar
Nirschl, Michael
Holtmann-Rice, Daniel
Liao, Hank
Suresh, Ananda Theertha
Yu, Felix
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 165 - 172
[29] Post-error Correction in Automatic Speech Recognition Using Discourse Information
Kang, Sangwoo
Kim, Ji-Hwan
Seo, Jungyun
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 53 - 56
[30] On the importance of phase in human speech recognition
Shi, Guangji
Shanechi, Maryam Modir
Aarabi, Parham
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1867 - 1874

← 1 2 3 4 5 →