Speech Recognition using Long-Term Phase Information

被引：0

作者：

Yamamoto, Kazumasa ^{[1
]}

Sueyoshi, Eiichi ^{[1
]}

Nakagawa, Seiichi ^{[1
]}

机构：

[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

speech recognition; phase information; long-term analysis; group delay;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.

引用

页码：1189 / 1192

页数：4

共 50 条

[31] INFORMATION RETRIEVAL METHODS FOR AUTOMATIC SPEECH RECOGNITION
Xiao, Xiaoqiang
Droppo, Jasha
Acero, Alex
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5550 - 5553
[32] Speech recognition for illiterate access to information and technology
Plauche, Madelaine
Nallasamy, Udhyakurnar
Pal, Joyojeet
Wooters, Chuck
Ramachandran, Divya
2006 International Conference on Information and Communication Technologies and Development, 2006, : 83 - 92
[33] POSITION INFORMATION FOR LANGUAGE MODELING IN SPEECH RECOGNITION
Chiu, Hsuan-Sheng
Chen, Guan-Yu
Lee, Chun-Jen
Chen, Berlin
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 101 - 104
[34] Information divergence criterion in speech signal recognition
Bocharov, I
Lukin, P
FUNDAMENTA INFORMATICAE, 2005, 68 (04) : 303 - 313
[35] SPEECH AUGMENTATION USING WAVENET IN SPEECH RECOGNITION
Wang, Jisung
Kim, Sangki
Lee, Yeha
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6770 - 6774
[36] Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information
Sanchez, Jon
Saratxaga, Ibon
Hernaez, Inma
Navas, Eva
Erro, Daniel
Raitio, Tuomo
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2015, 10 (04) : 810 - 820
[37] Development of Output Correction Methodology for Long Short Term Memory-Based Speech Recognition
Arslan, Recep Sinan
Barisci, Necaattin
SUSTAINABILITY, 2019, 11 (15)
[38] Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
Guo, Lili
Wang, Longbiao
Dang, Jianwu
Chng, Eng Siong
Nakagawa, Seiichi
SPEECH COMMUNICATION, 2022, 136 : 118 - 127
[39] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
Parcollet, Titouan
Morchid, Mohamed
Linares, Georges
De Mori, Renato
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
[40] Speech recognition using fractals
Bohez, ELJ
Senevirathne, TR
PATTERN RECOGNITION, 2001, 34 (11) : 2227 - 2243

← 1 2 3 4 5 →