Speech Recognition using Long-Term Phase Information

被引：0

作者：

Yamamoto, Kazumasa ^{[1
]}

Sueyoshi, Eiichi ^{[1
]}

Nakagawa, Seiichi ^{[1
]}

机构：

[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

speech recognition; phase information; long-term analysis; group delay;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.

引用

页码：1189 / 1192

页数：4

共 50 条

[41] Importance of Phase Information in Speech Enhancement
Moon, Sang-Hyun
Kim, Bonam
Lee, In-Sung
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 770 - 773
[42] Speech Recognition in an Enclosure with a Long Reverberation Time
Kocinski, Jedrzej
Ozimek, Edward
ARCHIVES OF ACOUSTICS, 2016, 41 (02) : 255 - 264
[43] USING CONTEXTUAL INFORMATION IN JOINT FACTOR EIGENSPACE MLLR FOR SPEECH RECOGNITION IN DIVERSE SCENARIOS
Saz, Oscar
Hain, Thomas
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[44] Discriminative Named Entity Recognition of Speech Data using Speech Recognition Confidence
Sudoh, Katsuhito
Tsukada, Hajime
Isozaki, Hideki
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 337 - 340
[45] Towards End-to-End Speech Recognition for Chinese Mandarin using Long Short-Term Memory Recurrent Neural Networks
Li, Jie
Zhang, Heng
Cai, Xinyuan
Xu, Bo
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3615 - 3619
[46] Using mutual information criterion to design an efficient phoneme set for Chinese speech recognition
Zhang, Jin-Song
Hu, Xin-Hui
Nakamura, Satoshi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 508 - 513
[47] Automated cleft speech evaluation using speech recognition
Vucovich, Megan
Hallac, Rami R.
Kane, Alex A.
Cook, Julie
Van'T Slot, Cortney
Seaward, James R.
JOURNAL OF CRANIO-MAXILLOFACIAL SURGERY, 2017, 45 (08) : 1268 - 1271
[48] Estimation of Speech Intelligibility Using Speech Recognition Systems
Takano, Yusuke
Kondo, Kazuhiro
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (12): : 3368 - 3376
[49] Using Linear Models of Speech Trajectory in the Reconstructed Phase Space to Extract Useful Features for Speech Recognition System
Shekofteh, Yasser
Almasganj, Farshad
2012 19TH IRANIAN CONFERENCE OF BIOMEDICAL ENGINEERING (ICBME), 2012, : 182 - 185
[50] Relative phase information for detecting human speech and spoofed speech
Wang, Longbiao
Yoshida, Yohei
Kawakami, Yuta
Nakagawa, Seiichi
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2092 - 2096

← 1 2 3 4 5 →