Time-Varying LP Cepstral Features for Improved Isolated Word Speech Recognition

被引：0

作者：

Ang, Federico ^{[1
]}

Tsutsui, Hiroshi ^{[1
]}

Miyanaga, Yoshikazu ^{[1
]}

机构：

[1] Hokkaido Univ, ICN Lab, Sapporo, Hokkaido 0600814, Japan

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP) | 2015年

关键词：

time-varying AR model; isolated word speech recognition; LINEAR PREDICTION;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Isolated word speech recognition for small vocabulary tasks has found great success with Mel-frequency cepstral coefficients as the speech feature of choice. Voice-controlled embedded systems, using word models as the basic units of speech, have found their way in a variety of commercial products. While the recognition rates for these products can be considered commercially acceptable under clean environments, channel noise and other external factors can still degrade recognition performance in practice. We propose the use of cepstral features derived from time-varying linear predictive coding, where the autoregressive model of the speech signal is represented by coefficients that are linear combinations of some simple basis functions. Variations in the usage of the features are investigated, such as skipping adjacent features, averaging and hybrid features with the goal of improving the performance of a 142 vocabulary, isolated words Japanese speech recognition task.

引用

页码：302 / 306

页数：5

共 50 条

[1] Incorporation of Time-Varying LP Cepstral Features in HMM-Based Isolated Word Speech Recognition
Ang, Federico
Tsutsui, Hiroshi
Miyanaga, Yoshikazu
2015 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2015,
[2] A method of extracting time-varying acoustic features effective for speech recognition
Tanaka, K
Kojima, H
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1391 - 1394
[3] Speech recognition using cepstral articulatory features
Najnin, Shamima
Banerjee, Bonny
SPEECH COMMUNICATION, 2019, 107 : 26 - 37
[4] Joint Bayesian Estimation of Time-Varying LP Parameters and Excitation for Speech
Chetupalli, Srikanth Raj
Sreenivas, T. V.
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) : 357 - 361
[5] Recursive estimation of time-varying environments for robust speech recognition
Zhao, YX
Wang, SJ
Yen, KC
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 225 - 228
[6] Robust speech recognition with time-varying filtering, interruptions, and noise
Lippmann, RP
Carlson, BA
1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 365 - 372
[7] Robust Speech Recognition Combining Cepstral and Articulatory Features
Zha, Zhuan-ling
Hu, Jin
Zhan, Qing-ran
Shan, Ya-hui
Xie, Xiang
Wang, Jing
Cheng, Hao-bo
PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1401 - 1405
[8] NMF-based Cepstral Features for Speech Emotion Recognition
Lashkari, Milad
Seyedin, Sanaz
2018 4TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2018, : 189 - 193
[9] Multiple-microphone time-varying filters for robust speech recognition
Lai, CYK
Aarabi, P
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 233 - 236
[10] BANGLA ISOLATED WORD SPEECH RECOGNITION
Firoze, Adnan
Arifin, M. Shamsul
Quadir, Ryana
Rahman, Rashedur M.
ICEIS 2011: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2, 2011, : 73 - 82

← 1 2 3 4 5 →