Connectionist Temporal Modeling of Video and Language: a Joint Model for Translation and Sign Labeling

被引:0
作者
Guo, Dan [1 ]
Tang, Shengeng [1 ]
Wang, Meng [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
来源
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online sign interpretation suffers from challenges presented by hybrid semantics learning among sequential variations of visual representations, sign linguistics, and textual grammars. This paper proposes a Connectionist Temporal Modeling (CTM) network for sentence translation and sign labeling. To acquire short-term temporal correlations, a Temporal Convolution Pyramid (TCP) module is performed on 2D CNN features to realize (2D+1D)=pseudo 3D' CNN features. CTM aligns the pseudo 3D' with the original 3D CNN clip features and fuses them. Next, we implement a connectionist decoding scheme for long-term sequential learning. Here, we embed dynamic programming into the decoding scheme, which learns temporal mapping among features, sign labels, and the generated sentence directly. The solution using dynamic programming to sign labeling is considered as pseudo labels. Finally, we utilize the pseudo supervision cues in an end-to-end framework. A joint objective function is designed to measure feature correlation, entropy regularization on sign labeling, and probability maximization on sentence decoding. The experimental results using the RWTH-PHOENIX-Weather and USTC-CSL datasets demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:751 / 757
页数:7
相关论文
共 27 条
  • [1] [Anonymous], 2015, Arxiv.Org, DOI DOI 10.3389/FPSYG.2013.00124
  • [2] SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition
    Camgoz, Necati Cihan
    Hadfield, Simon
    Koller, Oscar
    Bowden, Richard
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3075 - 3084
  • [3] Neural Sign Language Translation
    Camgoz, Necati Cihan
    Hadfield, Simon
    Koller, Oscar
    Ney, Hermann
    Bowden, Richard
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7784 - 7793
  • [4] Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization
    Cui, Runpeng
    Liu, Hu
    Zhang, Changshui
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1610 - 1618
  • [5] Graves A., 2006, P 23 INT C MACHINE L, P369
  • [6] Guo D, 2018, AAAI CONF ARTIF INTE, P6845
  • [7] Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition
    Guo, Dan
    Zhou, Wengang
    Li, Houqiang
    Wang, Meng
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (01)
  • [8] Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition
    Hara, Kensho
    Kataoka, Hirokatsu
    Satoh, Yutaka
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3154 - 3160
  • [9] Huang J, 2018, AAAI CONF ARTIF INTE, P2257
  • [10] Jelodar Ahmad Babaeian, 2018, LONG ACTIVITY VIDEO