Imageability- and Length-Controllable Image Captioning

被引：6

作者：

Kastner, Marc A. ^{[1
]}

Umemura, Kazuki ^{[2
]}

Ide, Ichiro ^{[2
,3
]}

Kawanishi, Yasutomo ^{[2
,4
]}

Hirayama, Takatsugu ^{[2
,5
]}

Doman, Keisuke ^{[6
]}

Deguchi, Daisuke ^{[2
]}

Murase, Hiroshi ^{[2
]}

Satoh, Shin'Ichi ^{[1
]}

机构：

[1] Natl Inst Informat, Digital Content & Media Sci Res Div, Chiyoda Ku, Tokyo 1018430, Japan

[2] Nagoya Univ, Grad Sch Informat, Chikusa Ku, Nagoya, Aichi 4648601, Japan

[3] Nagoya Univ, Math & Data Sci Ctr, Chikusa Ku, Nagoya, Aichi 4648601, Japan

[4] RIKEN, Guardian Robot Project, Informat Res & Dev & Strategy Headquarters, Seika, Kyoto 6190288, Japan

[5] Univ Human Environm, Fac Human Environm, Okazaki, Aichi 4443505, Japan

[6] Chukyo Univ, Sch Engn, Toyota, Aichi 4700393, Japan

来源：

IEEE ACCESS | 2021年 / 9卷 / 09期

关键词：

Visualization; Transformers; Task analysis; Sports; Licenses; Informatics; Training; Machine learning; semantics; task analysis; image captioning; psycholinguistics; DATABASE; RATINGS;

D O I：

10.1109/ACCESS.2021.3131393

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image captioning can show great performance for generating captions for general purposes, but it remains difficult to adjust the generated captions for different applications. In this paper, we propose an image captioning method which can generate both imageability- and length-controllable captions. The imageability parameter adjusts the level of visual descriptiveness of the caption, making it either more abstract or more concrete. In contrast, the length parameter only adjusts the length of the caption while keeping the visual descriptiveness on a similar degree. Based on a transformer architecture, our model is trained using an augmented dataset with diversified captions across different degrees of descriptiveness. The resulting model can control both imageability and length, making it possible to tailor output towards various applications. Experiments show that we can maintain a captioning performance similar to comparison methods, while being able to control the visual descriptiveness and the length of the generated captions. A subjective evaluation with human participants also shows a significant correlation of the target imageability in terms of human expectations. Thus, we confirmed that the proposed method provides a promising step towards tailoring image captions closer to certain applications.

引用

页码：162951 / 162961

页数：11

共 47 条

[1] SPICE: Semantic Propositional Image Caption Evaluation
Anderson, Peter
Fernando, Basura
Johnson, Mark
Gould, Stephen
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
[2] Chaorui Deng, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12358), P712, DOI 10.1007/978-3-030-58601-0_42
[3] Chen SZ, 2020, PROC CVPR IEEE, P9959, DOI 10.1109/CVPR42600.2020.00998
[4] "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention
Chen, Tianlang
Zhang, Zhongping
You, Quanzeng
Fang, Chen
Wang, Zhaowen
Jin, Hailin
Luo, Jiebo
[J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 527 - 543
[5] Cornia M, 2020, PROC CVPR IEEE, P10575, DOI 10.1109/CVPR42600.2020.01059
[6] Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8299 - 8308
[7] Imageability ratings for 3,000 monosyllabic words
Cortese, MJ
Fugett, A
[J]. BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 2004, 36 (03): : 384 - 387
[8] Denkowski M., 2014, P 9 WORKSHOP STAT MA, P376
[9] Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech
Deshpande, Aditya
Aneja, Jyoti
Wang, Liwei
Schwing, Alexander
Forsyth, David
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10687 - 10696
[10] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 →