Sentences Prediction Based on Automatic Lip-Reading Detection with Deep Learning Convolutional Neural Networks Using Video-Based Features

被引：0

作者：

Mahboob, Khalid ^{[1
]}

Nizami, Hafsa ^{[1
]}

Ali, Fayyaz ^{[1
]}

Alvi, Farrukh ^{[1
]}

机构：

[1] Sir Syed Univ Engn & Technol, Dept Software Engn, Karachi, Pakistan

来源：

SOFT COMPUTING IN DATA SCIENCE, SCDS 2021 | 2021年 / 1489卷

关键词：

Lip-reading; Convolutional neural networks; Model;

D O I：

10.1007/978-981-16-7334-4_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Lip-reading is the process of deciphering text from a speaker's visual interpretation of facial, lip, and mouth movements without using audio. The challenge is traditionally divided into two stages: creating or learning visual characteristics and prediction. End-to-end techniques for deep lip-reading have been popular in recent years. Existing work on end-to-end models, on the other hand, only does word classification rather than sentence-level sequence prediction. Longer words improve human lip-reading ability, suggesting the relevance of characteristics that capture the temporal context in an inconsistent communication channel. In this study, an end-to-end model based on deep learning convolutional neural network shave been employed to develop an automated lip-reading system that uses a re-current network spatiotemporal convolutions, and the connectionist temporal classification loss to translate a variable-length series of video frames to text. The accuracy of the trained lip-reading process in predicting sentences was evaluated using video-based features.

引用

页码：42 / 53

页数：12

共 16 条

[1] Lip-Reading Driven Deep Learning Approach for Speech Enhancement [J].

Adeel, Ahsan ;

Gogate, Mandar ;

Hussain, Amir ;

Whitmer, William M. .

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (03) :481-490

[2] Survey on automatic lip-reading in the era of deep learning [J].

Fernandez-Lopez, Adriana ;

Sukno, Federico M. .

IMAGE AND VISION COMPUTING, 2018, 78 :53-72

[3]

Garg A, 2017, P 30 IEEE C COMPUTER, P3450

[4]

Ivanko Denis, 2021, Proceedings of 15th International Conference on Electromechanics and Robotics Zavalishins ReadingsER(ZR) 2020. Smart Innovation, Systems and Technologies (SIST 187), P197, DOI 10.1007/978-981-15-5580-0_16

[5] Adaptive CNN Ensemble for Complex Multispectral Image Analysis [J].

Jameel, Syed Muslim ;

Hashmani, Manzoor Ahmed ;

Rehman, Mobashar ;

Budiman, Arif .

COMPLEXITY, 2020, 2020

[6] An Optimized Deep Convolutional Neural Network Architecture for Concept Drifted Image Classification [J].

Jameel, Syed Muslim ;

Hashmani, Manzoor Ahmed ;

Alhussain, Hitham ;

Rehman, Mobashar ;

Budiman, Arif .

INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2020, 1037 :932-942

[7]

Kherdekar VA., 2021, TURKISH J COMPUTER M, V12, P4034, DOI DOI 10.17762/TURCOMAT.V12I6.8374

[8]

Li Y., 2016, 2016 IEEEACIS 15 INT, DOI [10.1109/icis.2016.7550888, DOI 10.1109/ICIS.2016.7550888]

[9] Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory [J].

Lu, Yuanyao ;

Li, Hongbo .

APPLIED SCIENCES-BASEL, 2019, 9 (08)

[10] Lip-reading with Densely Connected Temporal Convolutional Networks [J].

Ma, Pingchuan ;

Wang, Yujiang ;

Shen, Jie ;

Petridis, Stavros ;

Pantic, Maja .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2856-2865

← 1 2 →