On the role of multimodal learning in the recognition of sign language

被引：0

作者：

Pedro M. Ferreira

Jaime S. Cardoso

Ana Rebelo

机构：

[1] INESC TEC and Universidade do Porto,

[2] INESC TEC and Univ Portucalense,undefined

来源：

Multimedia Tools and Applications | 2019年 / 78卷

关键词：

Sign language recognition; Multimodal learning; Convolutional neural networks; Kinect; Leap motion;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Sign Language Recognition (SLR) has become one of the most important research areas in the field of human computer interaction. SLR systems are meant to automatically translate sign language into text or speech, in order to reduce the communicational gap between deaf and hearing people. The aim of this paper is to exploit multimodal learning techniques for an accurate SLR, making use of data provided by Kinect and Leap Motion. In this regard, single-modality approaches as well as different multimodal methods, mainly based on convolutional neural networks, are proposed. Our main contribution is a novel multimodal end-to-end neural network that explicitly models private feature representations that are specific to each modality and shared feature representations that are similar between modalities. By imposing such regularization in the learning process, the underlying idea is to increase the discriminative ability of the learned features and, hence, improve the generalization capability of the model. Experimental results demonstrate that multimodal learning yields an overall improvement in the sign recognition performance. In particular, the novel neural network architecture outperforms the current state-of-the-art methods for the SLR task.

引用

页码：10035 / 10056

页数：21

共 39 条

[1]

Dominio F(2014)Combining multiple depth-based descriptors for hand gesture recognition Pattern Recogn Lett 50 101-111

[2]

Donadeo M(2014)Skin segmentation using yuv and rgb color spaces J Inf Process Syst 10 283-724

[3]

Zanuttigh P(2015)Deep learning for detecting robotic grasps Int J Robot Res 34 705-15,015

[4]

Hamid ATZ(2016)Hand gesture recognition with jointly calibrated leap motion and depth sensor Multimedia Tools and Applications 75 14,991-108

[5]

Wirza RR(2017)Deep multimodal learning: a survey on recent advances and trends IEEE Signal Proc Mag 34 96-13

[6]

Iqbal SM(2016)A taxonomy of deep convolutional neural nets for computer vision Frontiers in Robotics and AI 2 1-1958

[7]

Suhaiza SP(2014)Dropout: a simple way to prevent neural networks from overfitting J Mach Learn Res 15 1929-1898

[8]

Lenz I(2015)Large-margin multi-modal deep learning for rgb-d object recognition IEEE Trans Multimedia 17 1887-8

[9]

Lee H(2015)Maximum mutual information regularized classification Eng Appl Artif Intell 37 1-147

[10]

Saxena A(2015)Sign language recognition with the kinect sensor based on conditional random fields Sensors 15 135-1525

← 1 2 3 4 →