RETRACTED: Audio-Visual Automatic Speech Recognition Towards Education for Disabilities (Retracted Article)

被引:14
作者
Debnath, Saswati [1 ]
Roy, Pinki [2 ]
Namasudra, Suyel [3 ,4 ]
Crespo, Ruben Gonzalez [4 ]
机构
[1] Alliance Univ, Dept Comp Sci & Engn, Bangalore, Karnataka, India
[2] Natl Inst Technol, Dept Comp Sci & Engn, Silchar, Assam, India
[3] Natl Inst Technol Patna, Dept Comp Sci & Engn, Patna, Bihar, India
[4] Univ Int La Rioja, Logrono, Spain
关键词
AV-ASR; LBP-TOP; GLCM; MFCC; Clustering algorithm; Supervised learning;
D O I
10.1007/s10803-022-05654-4
中图分类号
B844 [发展心理学(人类心理学)];
学科分类号
040202 ;
摘要
Education is a fundamental right that enriches everyone's life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition.
引用
收藏
页码:3581 / 3594
页数:14
相关论文
共 43 条
[1]   Face description with local binary patterns:: Application to face recognition [J].
Ahonen, Timo ;
Hadid, Abdenour ;
Pietikainen, Matti .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (12) :2037-2041
[2]  
Ashok Kumar L., 2022, International Journal of Cognitive Computing in Engineering, V3, P24, DOI DOI 10.1016/J.IJCCE.2022.01.003
[3]  
Azeta Ambrose, 2010, International Journal of Computing, V9, P327, DOI [10.47839/ijc.9.4.726, DOI 10.47839/IJC.9.4.726]
[4]  
Borde P, 2016, International Journal of Computer Applications, V137, P25, DOI [10.5120/ijca2016908696, 10.5120/ijca2016908696, DOI 10.5120/IJCA2016908696]
[5]   Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition [J].
Borde, Prashant ;
Varpe, Amarsinh ;
Manza, Ramesh ;
Yannawar, Pravin .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (02) :167-175
[6]   Image-denoising algorithm based on improved K-singular value decomposition and atom optimization [J].
Chen, Rui ;
Pu, Dong ;
Tong, Ying ;
Wu, Minghu .
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (01) :117-127
[7]  
Dave, 2015, ELECTR COMPUT ENG IN, V4, P452, DOI [10.14810/ecij.2015.4403, DOI 10.14810/ECIJ.2015.4403]
[8]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[9]  
Debnath S, 2019, International Journal of Medical Engineering and Informatics, V11, P71, DOI [10.1504/ijmei.2019.096893, 10.1504/IJMEI.2019.096893, DOI 10.1504/IJMEI.2019.096893]
[10]   Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis [J].
Debnath, Saswati ;
Roy, Pinki .
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02) :121-133