Image captioning with data augmentation using cropping and mask based on attention image

被引:0
|
作者
Iwamura K.
Louhi Kasahara J.Y.
Moro A.
Yamashita A.
Asama H.
机构
关键词
Attention; Cropping; Data augmentation; Deep learning; Mask;
D O I
10.2493/jjspe.86.904
中图分类号
学科分类号
摘要
Automatic image captioning has various important applications such as the depiction of contents for the visually impaired. Most approaches use Deep Learning and have achieved remarkable results. However there are still some unresolved issues. One of them is the overfilling of the trained model to specific images, usually caused by limited training dataset sizes. In order to augment the training dataset size in such scenarios, previous researches proposed data augmentation using random cropping or mask. However, those do not specifically target overfitted regions in images and, therefore, may remove areas in images that are needed to generate captions and lower performance. In this study, we propose a novel data augmentation method that targets specifically regions in images subject to overfitting by using attention. Experimental results show that the proposed method allows generation of better image captions. © 2020 Japan Society for Precision Engineering. All rights reserved.
引用
收藏
页码:904 / 910
页数:6
相关论文
共 50 条
  • [1] Multimodal Data Augmentation for Image Captioning using Diffusion Models
    Xiao, Changrong
    Xu, Sean Xin
    Zhang, Kunpeng
    PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 23 - 33
  • [2] Text Augmentation Using BERT for Image Captioning
    Atliha, Viktar
    Sesok, Dmitrij
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [3] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [4] Attention-Based Image Captioning Using DenseNet Features
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 109 - 117
  • [5] An Image Captioning Approach Using Dynamical Attention
    Wang, Changzhi
    Gu, Xiaodong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] Image Captioning using Facial Expression and Attention
    Nezami, Omid Mohamad
    Dras, Mark
    Wan, Stephen
    Paris, Cecile
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2020, 68 : 661 - 689
  • [7] Image captioning using facial expression and attention
    Nezami O.M.
    Dras M.
    Wan S.
    Paris C.
    Journal of Artificial Intelligence Research, 2020, 68 : 661 - 689
  • [8] Image Captioning Based on Visual and Semantic Attention
    Wei, Haiyang
    Li, Zhixin
    Zhang, Canlong
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 151 - 162
  • [9] Clustering-based mask recovery for image captioning
    Liang, Xu
    Li, Chen
    Tian, Lihua
    NEUROCOMPUTING, 2024, 599
  • [10] Data Augmentation Using Random Image Cropping and Patching for Deep CNNs
    Takahashi, Ryo
    Matsubara, Takashi
    Uehara, Kuniaki
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2917 - 2931