Automatic lipreading by optical-flow analysis

被引：1

作者：

Mase, Kenji ^{[1
]}

Pentland, Alex ^{[1
]}

机构：

[1] NTT Human Interface Lab, Yokosuka, Japan

来源：

Systems and Computers in Japan | 1991年 / 22卷 / 06期

关键词：

Computer Interfaces - Research - Computers - Applications - Image Processing - Applications - Pattern Recognition - Computer Applications - Speech - Recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

While the acoustic signal is the primary cue in human speech recognition, the visual cue is also very useful, especially when the acoustic signal is distorted. A computer system is developed for automatic recognition of continuously spoken words by using only visual data. The velocity of lip motions may be measured from optical flow data which allows muscle action to be estimated. Pauses in muscle action result in zero velocity of the flow and are used to locate word boundaries. The pattern of muscle action is used to recognize the spoken words. In limited experiments involving the recognition of digits, it appears that the visually derived patterns of muscle action are stable for multiple utterances of the same word. Even across speakers the patterns are so similar that speaker-independent recognition is possible. An overall accuracy including word spotting and recognition of approximately 70 percent is obtained across for continuously spoken test samples from three speakers.

引用

页码：67 / 76