A Robust Feature Extraction with Dual Fusion aided Extreme Learning for Audio-Visual Hindi Speech Recognition

被引：0

作者：

Sharma, Usha ^{[1
]}

Om, Hari ^{[2
]}

Mishra, A. N. ^{[3
]}

机构：

[1] Indian Inst Technol ISM Dhanbad, Dhanbad, Bihar, India

[2] Indian Inst Technol ISM Dhanbad, CSE Dept, Dhanbad, Bihar, India

[3] Krishna Engn Coll, Ghaziabad, India

来源：

JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH | 2020年 / 79卷 / 05期

关键词：

Speech recognition; Audio-visual; Jaya optimization; Bottleneck DNN; ELM;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

In Automatic Speech Recognition (ASR) based system implementation, robustness to several noisy background situation is a unique challenge. In this paper, for estimating both audio and visual aspect feature in light of different information representation perspectives directs to the robust feature extraction from audio-visual speech image. Further, the authors obtain the bottleneck features from the bottleneck layer of the bottleneck deep neural network (BN-DNN). Further, a familiar powerful texture descriptor of Local Binary Pattern (LBP) and Local Phase Quantization (LPQ) is applied to obtain the visual related features from the face region. Moreover, the categorization is executed utilizing the help of Extreme Learning Machine (ELM) and to reach the global optimum through Jaya optimization algorithm for audio-visual Hindi speech recognition. The proposed scheme is evaluated in MATLAB platform and the implementation is equated with the existing audio-visual speech recognition (AVSR) approaches.

引用

页码：383 / 386

页数：4

共 11 条

[1] Learning Dynamic Stream Weights For Coupled-HMM-Based Audio-Visual Speech Recognition [J].

Abdelaziz, Ahmed Hussen ;

Zeiler, Steffen ;

Kolossa, Dorothea .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (05) :863-876

[2] Speaker Diarization: A Review of Recent Research [J].

Anguera Miro, Xavier ;

Bozonnet, Simon ;

Evans, Nicholas ;

Fredouille, Corinne ;

Friedland, Gerald ;

Vinyals, Oriol .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :356-370

[3] Automatic speech recognition and speech variability: A review [J].

Benzeghiba, M. ;

De Mori, R. ;

Deroo, O. ;

Dupont, S. ;

Erbes, T. ;

Jouvet, D. ;

Fissore, L. ;

Laface, P. ;

Mertins, A. ;

Ris, C. ;

Rose, R. ;

Tyagi, V. ;

Wellekens, C. .

SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786

[4] Multiple camera in car audio-visual speech recognition using phonetic and visemic information [J].

Biswas, Astik ;

Sahu, P. K. ;

Chandra, Mahesh .

COMPUTERS & ELECTRICAL ENGINEERING, 2015, 47 :35-50

[5] Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition [J].

Biswas, Astik ;

Sahu, P. K. ;

Chandra, Mahesh .

COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (04) :1111-1122

[6] Information Theoretic Feature Extraction for Audio-Visual Speech Recognition [J].

Gurban, Mihai ;

Thiran, Jean-Philippe .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (12) :4765-4776

[7]

Nimje K, 2011, J SCI IND RES INDIA, V70, P270

[8] Audio-visual speech recognition using deep learning [J].

Noda, Kuniaki ;

Yamaguchi, Yuki ;

Nakadai, Kazuhiro ;

Okuno, Hiroshi G. ;

Ogata, Tetsuya .

APPLIED INTELLIGENCE, 2015, 42 (04) :722-737

[9]

Pandey HM, 2016, 2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), P728, DOI 10.1109/CONFLUENCE.2016.7508215

[10] Hindi viseme recognition using subspace DCT features [J].

Varshney, Priyanka ;

Farooq, Omar ;

Upadhyaya, Prashant .

INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2014, 1 (03) :257-272

← 1 2 →