Temporal Convolutional and Recurrent Networks for Image Captioning

被引:0
作者
Iskra, Natalia [1 ]
Iskra, Vitaly [2 ]
机构
[1] Belarusian State Univ Informat & Radioelect, Minsk, BELARUS
[2] Omnigon Commun LLC, New York, NY USA
来源
PATTERN RECOGNITION AND INFORMATION PROCESSING, PRIP 2019 | 2019年 / 1055卷
关键词
Image captioning; Convolutional neural networks; Recurrent neural networks; Visual Genome; Dilated convolution; Weight normalization; Dropout; Adam optimization;
D O I
10.1007/978-3-030-35430-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently temporal convolutional networks have shown excellent qualities in sequence modeling tasks [1]. Taking this fact into account, in this paper we investigate the possibilities of replacing recurrent networks in architectures targeted specifically at image captioning. We evaluate the solution on Visual Genome dataset [2], which provides extensive set of labels and descriptions that thoroughly grounds visual concepts to natural language.
引用
收藏
页码:254 / 266
页数:13
相关论文
共 18 条
  • [1] Bai Shaojie, 2018, Universal language model fine-tuning for text classification
  • [2] Banerjee S., 2005, P ACM WORKSH INTR EX
  • [3] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [4] A Comprehensive Survey of Deep Learning for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    [J]. ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [5] Recurrent Fusion Network for Image Captioning
    Jiang, Wenhao
    Ma, Lin
    Jiang, Yu-Gang
    Liu, Wei
    Zhang, Tong
    [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 510 - 526
  • [6] DenseCap: Fully Convolutional Localization Networks for Dense Captioning
    Johnson, Justin
    Karpathy, Andrej
    Fei-Fei, Li
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4565 - 4574
  • [7] Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
  • [8] Kingma DP, 2014, ADV NEUR IN, V27
  • [9] Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
    Krishna, Ranjay
    Zhu, Yuke
    Groth, Oliver
    Johnson, Justin
    Hata, Kenji
    Kravitz, Joshua
    Chen, Stephanie
    Kalantidis, Yannis
    Li, Li-Jia
    Shamma, David A.
    Bernstein, Michael S.
    Li Fei-Fei
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 32 - 73
  • [10] Kulkarni G, 2011, PROC CVPR IEEE, P1601, DOI 10.1109/CVPR.2011.5995466