Connectionist Temporal Modeling of Video and Language: a Joint Model for Translation and Sign Labeling

被引：0

作者：

Guo, Dan ^{[1
]}

Tang, Shengeng ^{[1
]}

Wang, Meng ^{[1
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Online sign interpretation suffers from challenges presented by hybrid semantics learning among sequential variations of visual representations, sign linguistics, and textual grammars. This paper proposes a Connectionist Temporal Modeling (CTM) network for sentence translation and sign labeling. To acquire short-term temporal correlations, a Temporal Convolution Pyramid (TCP) module is performed on 2D CNN features to realize (2D+1D)=pseudo 3D' CNN features. CTM aligns the pseudo 3D' with the original 3D CNN clip features and fuses them. Next, we implement a connectionist decoding scheme for long-term sequential learning. Here, we embed dynamic programming into the decoding scheme, which learns temporal mapping among features, sign labels, and the generated sentence directly. The solution using dynamic programming to sign labeling is considered as pseudo labels. Finally, we utilize the pseudo supervision cues in an end-to-end framework. A joint objective function is designed to measure feature correlation, entropy regularization on sign labeling, and probability maximization on sentence decoding. The experimental results using the RWTH-PHOENIX-Weather and USTC-CSL datasets demonstrate the effectiveness of the proposed approach.

引用

页码：751 / 757

页数：7

共 27 条

[1] [Anonymous], 2015, Arxiv.Org, DOI DOI 10.3389/FPSYG.2013.00124
[2] SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition
Camgoz, Necati Cihan
Hadfield, Simon
Koller, Oscar
Bowden, Richard
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3075 - 3084
[3] Neural Sign Language Translation
Camgoz, Necati Cihan
Hadfield, Simon
Koller, Oscar
Ney, Hermann
Bowden, Richard
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7784 - 7793
[4] Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization
Cui, Runpeng
Liu, Hu
Zhang, Changshui
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1610 - 1618
[5] Graves A., 2006, P 23 INT C MACHINE L, P369
[6] Guo D, 2018, AAAI CONF ARTIF INTE, P6845
[7] Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition
Guo, Dan
Zhou, Wengang
Li, Houqiang
Wang, Meng
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (01)
[8] Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition
Hara, Kensho
Kataoka, Hirokatsu
Satoh, Yutaka
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3154 - 3160
[9] Huang J, 2018, AAAI CONF ARTIF INTE, P2257
[10] Jelodar Ahmad Babaeian, 2018, LONG ACTIVITY VIDEO

← 1 2 3 →