Sentences Prediction Based on Automatic Lip-Reading Detection with Deep Learning Convolutional Neural Networks Using Video-Based Features

被引:0
作者
Mahboob, Khalid [1 ]
Nizami, Hafsa [1 ]
Ali, Fayyaz [1 ]
Alvi, Farrukh [1 ]
机构
[1] Sir Syed Univ Engn & Technol, Dept Software Engn, Karachi, Pakistan
来源
SOFT COMPUTING IN DATA SCIENCE, SCDS 2021 | 2021年 / 1489卷
关键词
Lip-reading; Convolutional neural networks; Model;
D O I
10.1007/978-981-16-7334-4_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lip-reading is the process of deciphering text from a speaker's visual interpretation of facial, lip, and mouth movements without using audio. The challenge is traditionally divided into two stages: creating or learning visual characteristics and prediction. End-to-end techniques for deep lip-reading have been popular in recent years. Existing work on end-to-end models, on the other hand, only does word classification rather than sentence-level sequence prediction. Longer words improve human lip-reading ability, suggesting the relevance of characteristics that capture the temporal context in an inconsistent communication channel. In this study, an end-to-end model based on deep learning convolutional neural network shave been employed to develop an automated lip-reading system that uses a re-current network spatiotemporal convolutions, and the connectionist temporal classification loss to translate a variable-length series of video frames to text. The accuracy of the trained lip-reading process in predicting sentences was evaluated using video-based features.
引用
收藏
页码:42 / 53
页数:12
相关论文
共 16 条
[1]   Lip-Reading Driven Deep Learning Approach for Speech Enhancement [J].
Adeel, Ahsan ;
Gogate, Mandar ;
Hussain, Amir ;
Whitmer, William M. .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (03) :481-490
[2]   Survey on automatic lip-reading in the era of deep learning [J].
Fernandez-Lopez, Adriana ;
Sukno, Federico M. .
IMAGE AND VISION COMPUTING, 2018, 78 :53-72
[3]  
Garg A, 2017, P 30 IEEE C COMPUTER, P3450
[4]  
Ivanko Denis, 2021, Proceedings of 15th International Conference on Electromechanics and Robotics Zavalishins ReadingsER(ZR) 2020. Smart Innovation, Systems and Technologies (SIST 187), P197, DOI 10.1007/978-981-15-5580-0_16
[5]   Adaptive CNN Ensemble for Complex Multispectral Image Analysis [J].
Jameel, Syed Muslim ;
Hashmani, Manzoor Ahmed ;
Rehman, Mobashar ;
Budiman, Arif .
COMPLEXITY, 2020, 2020
[6]   An Optimized Deep Convolutional Neural Network Architecture for Concept Drifted Image Classification [J].
Jameel, Syed Muslim ;
Hashmani, Manzoor Ahmed ;
Alhussain, Hitham ;
Rehman, Mobashar ;
Budiman, Arif .
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2020, 1037 :932-942
[7]  
Kherdekar VA., 2021, TURKISH J COMPUTER M, V12, P4034, DOI DOI 10.17762/TURCOMAT.V12I6.8374
[8]  
Li Y., 2016, 2016 IEEEACIS 15 INT, DOI [10.1109/icis.2016.7550888, DOI 10.1109/ICIS.2016.7550888]
[9]   Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory [J].
Lu, Yuanyao ;
Li, Hongbo .
APPLIED SCIENCES-BASEL, 2019, 9 (08)
[10]   Lip-reading with Densely Connected Temporal Convolutional Networks [J].
Ma, Pingchuan ;
Wang, Yujiang ;
Shen, Jie ;
Petridis, Stavros ;
Pantic, Maja .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2856-2865