Image-Captioning Model Compression

被引:3
作者
Atliha, Viktar [1 ]
Sesok, Dmitrij [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 03期
关键词
image captioning; model compression; pruning; quantization; NETWORK;
D O I
10.3390/app12031638
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder-decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.
引用
收藏
页数:14
相关论文
共 60 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] SPICE: Semantic Propositional Image Caption Evaluation
    Anderson, Peter
    Fernando, Basura
    Johnson, Mark
    Gould, Stephen
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
  • [3] [Anonymous], SQUEEZENET ALEXNET L
  • [4] Structured Pruning of Deep Convolutional Neural Networks
    Anwar, Sajid
    Hwang, Kyuyeon
    Sung, Wonyong
    [J]. ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2017, 13 (03)
  • [5] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [6] Banerjee Satanjeev, 2005, P ACL WORKSH INTR EX, P65
  • [7] Region-based image retrieval in the compressed domain using shape-adaptive DCT
    Belalia, Amina
    Belloulata, Kamel
    Kpalma, Kidiyo
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (17) : 10175 - 10199
  • [8] Choi J., 2018, 180506085 ARXIV
  • [9] A comprehensive survey on model compression and acceleration
    Choudhary, Tejalal
    Mishra, Vipul
    Goswami, Anurag
    Sarangapani, Jagannathan
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (07) : 5113 - 5155
  • [10] Cornia M, 2020, PROC CVPR IEEE, P10575, DOI 10.1109/CVPR42600.2020.01059