Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals

被引：17

作者：

Firooz, Shabnam Gholamdokht ^{[1
]}

Almasganj, Farshad ^{[1
]}

Shekofteh, Yasser ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Biomed Engn Dept, Hafez Ave,POB 15875-4413, Tehran, Iran

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2017年 / 58卷

关键词：

Automatic speech recognition; Mel-frequency cepstral coefficients; Reconstructed phase space; Recurrence plot; Two-dimensional wavelet transform; QUANTIFICATION ANALYSIS; MODELS;

D O I：

10.1016/j.compeleceng.2016.07.006

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The spectral-based features, typically used in Automatic Speech Recognition (ASR) systems, reject the phase information of speech signals. Thus, employing extra features, in which the phase of the signal is not rejected, may fill this gap. Embedding the speech signal in the Reconstructed Phase Space (RPS) and then extracting some useful features from it, is a recently considered approach in this field. In this paper, we will follow this approach by evaluating some useful features from the Recurrence Plot (RP) of the embedded speech signals in the RPS; the proposed features are evaluated via applying a two-dimensional wavelet transform to the resulted RP diagrams. The proposed features are examined in an ASR task alone and in combination with the traditional Mel-Frequency Cepstral Coefficients (MFCC). For the second case, using English TIMIT corpus, 3.94% absolute classification accuracy improvement in the phoneme recognition accuracy rate, against using only the MFCC features is gained. (C) 2016 Elsevier Ltd. All rights reserved.

引用

页码：215 / 226

页数：12

共 27 条

[1]

[Anonymous], 2004, WILEY SER PROB STAT

[2]

Bromiley PA, 2010, STAT SEGMENT SERIES

[3] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[4] RECURRENCE PLOTS OF DYNAMIC-SYSTEMS [J].

ECKMANN, JP ;

KAMPHORST, SO ;

RUELLE, D .

EUROPHYSICS LETTERS, 1987, 4 (09) :973-977

[5] Enhancing the Feature Extraction Process for Automatic Speech Recognition with Fractal Dimensions [J].

Ezeiza, Aitzol ;

Lopez de Ipina, Karmele ;

Hernandez, Carmen ;

Barroso, Nora .

COGNITIVE COMPUTATION, 2013, 5 (04) :545-550

[6] Chaos control in the cerium-catalyzed Belousov-Zhabotinsky reaction using recurrence quantification analysis measures [J].

Fatoorehchi, Hooman ;

Zarghami, Reza ;

Abolghasemi, Hossein ;

Rach, Randolph .

CHAOS SOLITONS & FRACTALS, 2015, 76 :121-129

[7]

Garofolo JS, 1993, 93 NASA STI REC

[8] Improving mispronunciation detection using adaptive frequency scale [J].

Ge, Zhenhao ;

Sharma, Sudhendu R. ;

Smith, Mark J. T. .

COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (05) :1464-1472

[9] Statistical modeling of speech Poincareacute sections in combination of frequency analysis to improve speech recognition performance [J].

Jafari, Ayyoob ;

Almasganj, Farshad ;

Bidhendi, Maryam Nabi .

CHAOS, 2010, 20 (03)

[10] Time-domain isolated phoneme classification using reconstructed phase spaces [J].

Johnson, MT ;

Povinelli, RJ ;

Lindgren, AC ;

Ye, JJ ;

Liu, XL ;

Indrebo, KM .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (04) :458-466

← 1 2 3 →