Speech Recognition using Long-Term Phase Information

被引:0
作者
Yamamoto, Kazumasa [1 ]
Sueyoshi, Eiichi [1 ]
Nakagawa, Seiichi [1 ]
机构
[1] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
speech recognition; phase information; long-term analysis; group delay;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current speech recognition systems use mainly amplitude spectrum-based features such as MFFC for acoustic feature parameters, while discarding phase spectral information. The results of perceptual experiments, however, suggested that phase spectral information based on long-term analysis includes certain linguistic information. In this paper, we propose the use of phase features based on long-term analysis for speech recognition. We use two types of parameters: the delta phase parameter as a group delay and analytic group delay features. Isolated word and continuous digit recognition experiments were performed, resulting in a greater than 90% word or digit accuracy for each of the experiments. The experimental results confirmed that a long-term phase spectrum includes sufficient information for recognizing speech. Furthermore, combining likelihoods of MFCC and long-term group delay cepstrum outperformed the baseline MFCC relatively 20% for clean speech.
引用
收藏
页码:1189 / 1192
页数:4
相关论文
共 50 条
  • [41] Importance of Phase Information in Speech Enhancement
    Moon, Sang-Hyun
    Kim, Bonam
    Lee, In-Sung
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 770 - 773
  • [42] Speech Recognition in an Enclosure with a Long Reverberation Time
    Kocinski, Jedrzej
    Ozimek, Edward
    ARCHIVES OF ACOUSTICS, 2016, 41 (02) : 255 - 264
  • [43] USING CONTEXTUAL INFORMATION IN JOINT FACTOR EIGENSPACE MLLR FOR SPEECH RECOGNITION IN DIVERSE SCENARIOS
    Saz, Oscar
    Hain, Thomas
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] Discriminative Named Entity Recognition of Speech Data using Speech Recognition Confidence
    Sudoh, Katsuhito
    Tsukada, Hajime
    Isozaki, Hideki
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 337 - 340
  • [45] Towards End-to-End Speech Recognition for Chinese Mandarin using Long Short-Term Memory Recurrent Neural Networks
    Li, Jie
    Zhang, Heng
    Cai, Xinyuan
    Xu, Bo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3615 - 3619
  • [46] Using mutual information criterion to design an efficient phoneme set for Chinese speech recognition
    Zhang, Jin-Song
    Hu, Xin-Hui
    Nakamura, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 508 - 513
  • [47] Automated cleft speech evaluation using speech recognition
    Vucovich, Megan
    Hallac, Rami R.
    Kane, Alex A.
    Cook, Julie
    Van'T Slot, Cortney
    Seaward, James R.
    JOURNAL OF CRANIO-MAXILLOFACIAL SURGERY, 2017, 45 (08) : 1268 - 1271
  • [48] Estimation of Speech Intelligibility Using Speech Recognition Systems
    Takano, Yusuke
    Kondo, Kazuhiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (12): : 3368 - 3376
  • [49] Using Linear Models of Speech Trajectory in the Reconstructed Phase Space to Extract Useful Features for Speech Recognition System
    Shekofteh, Yasser
    Almasganj, Farshad
    2012 19TH IRANIAN CONFERENCE OF BIOMEDICAL ENGINEERING (ICBME), 2012, : 182 - 185
  • [50] Relative phase information for detecting human speech and spoofed speech
    Wang, Longbiao
    Yoshida, Yohei
    Kawakami, Yuta
    Nakagawa, Seiichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2092 - 2096