Augmenting machine learning for Amharic speech recognition: a paradigm of patient’s lips motion detection

被引：0

作者：

Muluken Birara

Gebeyehu Belay Gebremeskel

机构：

[1] Bahir Dar University: Bahir Dar Institute of Technology,

来源：

Multimedia Tools and Applications | 2022年 / 81卷

关键词：

Machine learning; Speech recognition; Lips motion; Average feature; Saturated component;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The method of automatic lip motion recognition is an essential input for visual speech detection. It is a technological approach to demystify people who are hard to hear, deaf, and a challenge of silent communication in day-to-day life. However, the recognition process is a challenge in terms of pronunciation variation, speech speeds, gesture variation, color, makeup, the video quality of the camera, and the way of feature extraction. This paper proposed a solution for automatic lip motion recognition by identifying lip movements and characterizing their association with the spoken words for the Amharic language spoken using the information available in lip movements. The input video is converting into consecutive image frames. We use a Viola-Jones object detection algorithm to gain YIQ color space and apply the saturation components to detect lip images from the face area. Sobel’s edge detection and morphological image operations implement to identify and extract the exact contour of the lip. We applied ANN and SVM classifiers on averaging shape information features, and we gained 65.71% and 66.43% classification accuracies of ANN and SVM, respectively. The findings presented in the Amharic Speech Recognition is the newly introduced technology to enhance the academic and linguistic skills of hearing-problem people, health domain experts, physicians, researchers, etc. The future research work presents in the light of the findings.

引用

页码：24377 / 24397

页数：20

共 56 条

[21]

Marathe A(2007)Lip localization and Viseme classification for visual speech recognition Int J Comput Inform Sciences 5 62-298

[22]

Najafiana M(2002)Detecting faces in images : a survey IEEE Trans Pattern Anal Mach 24 34-undefined

[23]

Russell M(2014)Review of lip-reading recognition Int Symp Comput Intell Design, ISCID 1 293-undefined

[24]

Poomhiran L(undefined)undefined undefined undefined undefined-undefined

[25]

Saitoh T(undefined)undefined undefined undefined undefined-undefined

[26]

Konishi R(undefined)undefined undefined undefined undefined-undefined

[27]

Saranya G(undefined)undefined undefined undefined undefined-undefined

[28]

Pravin A(undefined)undefined undefined undefined undefined-undefined

[29]

Sengupta S(undefined)undefined undefined undefined undefined-undefined

[30]

Bhattacharya A(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 4 5 6 →