Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection

被引：0

作者：

Muluken Birara

Gebeyehu Belay Gebremeskel

机构：

[1] Bahir Dar University: Bahir Dar Institute of Technology,

来源：

Multimedia Tools and Applications | 2022年 / 81卷

关键词：

Machine learning; Speech recognition; Lips motion; Average feature; Saturated component;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The method of automatic lip motion recognition is an essential input for visual speech detection. It is a technological approach to demystify people who are hard to hear, deaf, and a challenge of silent communication in day-to-day life. However, the recognition process is a challenge in terms of pronunciation variation, speech speeds, gesture variation, color, makeup, the video quality of the camera, and the way of feature extraction. This paper proposed a solution for automatic lip motion recognition by identifying lip movements and characterizing their association with the spoken words for the Amharic language spoken using the information available in lip movements. The input video is converting into consecutive image frames. We use a Viola-Jones object detection algorithm to gain YIQ color space and apply the saturation components to detect lip images from the face area. Sobel’s edge detection and morphological image operations implement to identify and extract the exact contour of the lip. We applied ANN and SVM classifiers on averaging shape information features, and we gained 65.71% and 66.43% classification accuracies of ANN and SVM, respectively. The findings presented in the Amharic Speech Recognition is the newly introduced technology to enhance the academic and linguistic skills of hearing-problem people, health domain experts, physicians, researchers, etc. The future research work presents in the light of the findings.

引用

页码：24377 / 24397

页数：20

共 56 条

[1] Borde P(2015)Recognition of isolated words using Zernike and MFCC features for audio-visual speech recognition Int J Speech Technol 18 167-175
[2] Varpe A(2020)Understanding visual lip-based biometric authentication for mobile devices EURASIP J Inf Secur 2020 1-16
[3] Manza R(2001)Blind inverse gamma correction IEEE Trans Image Process 10 1428-1433
[4] Yannawar P(2009)Lip shape and hand position fusion for automatic vowel recognition in cued speech for French IEEE Signal ProcessLett 16 339-342
[5] Carrie W(2006)Pattern recognition: an overview Int J Comp Sci Network Sec, IJCSNS 6 57-61
[6] William SD(2012)ANN vs. SVM : which one performs better in the classification of MCCs in mammogram imaging. Knowl-Based Syst 26 144-153
[7] Farid H(2004)Skin color detection using multiple cues 17th Int Conf Patt Recogn, ICPR’04, 1 1 632-635
[8] Heracleous P(2014)Learning multi-boosted HMMs for lip-password-based speaker verification IEEE Trans Inform Forensics Sec 9 233-246
[9] Aboutabit N(2019)Iterative improved learning algorithm for petrographic image classification accuracy enhancement Int J Electrical Comp Eng 9 289-296
[10] Beautemps D(2020)Automatic accent identification as an analytical tool for accent robust automatic speech recognition Elsevier, Speech Commun 122 44-55

← 1 2 3 4 5 6 →