What we see in a photograph: content selection for image captioning

被引：6

作者：

Barlas, Georgios ^{[1
]}

Veinidis, Christos ^{[1
]}

Arampatzis, Avi ^{[1
]}

机构：

[1] Democritus Univ Thrace, Dept Elect & Comp Engn, Xanthi 67100, Greece

来源：

VISUAL COMPUTER | 2021年 / 37卷 / 06期

关键词：

Image captioning; Computer vision; Image cognition; REPRESENTATION; WORDNET; MODELS;

D O I：

10.1007/s00371-020-01867-9

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We propose and experimentally investigate the usefulness of several features for selecting image content (objects) suitable for image captioning. The approach taken explores three broad categories of features, namely geometric, conceptual, and visual. Experiments suggest that widely known geometric 'rules' in art-aesthetics or photography (such as the golden ratio or the rule-of-thirds) and facts about the human visual system (such as its wider horizontal angle than its vertical) provide no useful information for the task. Human captioners seem to prefer large, elongated (but not in the golden ratio) objects, positioned near the image center, irrespective of orientation. Conceptually, the preferred objects are either too specific or too general, and animate things are almost always mentioned; furthermore, some evidence is found for selecting diverse objects in order to achieve maximal image coverage in captions. Visual object features such as saliency, depth, edges, entropy, and contrast, are all found to provide useful information. Beyond evaluating features in isolation, we investigate how well these are combined by performing feature and feature-category ablation studies, leading to an effective set of features which can be proven useful for operational systems. Moreover, we propose alternative ways for feature engineering and evaluation, dealing with the drawbacks of the evaluation methodology proposed in past literature.

引用

页码：1309 / 1326

页数：18

共 53 条

[1] Anderson P., 2017, ARXIV170707998 CORR
[2] [Anonymous], 1968, TALK STANF ART
[3] [Anonymous], 2014, ARXIV14050312 CORR
[4] [Anonymous], P 27 INT JOINT C ART, DOI DOI 10.24963/IJCAI.2018/114
[5] [Anonymous], 2015, J PUBLIC MANAGEMENT
[6] Versatile Query Scrambling for Private Web Search
Arampatzis, Avi
Drosatos, George
Efraimidis, Pavlos S.
[J]. INFORMATION RETRIEVAL JOURNAL, 2015, 18 (04): : 331 - 358
[7] A survey on automatic image caption generation
Bai, Shuang
An, Shan
[J]. NEUROCOMPUTING, 2018, 311 : 291 - 304
[8] Barlas G., 2016, CLEF 2016 C LABS EV, P279
[9] Bejan A., 2009, Int. J. Des. Nat. Ecodyn., V4, P97, DOI DOI 10.2495/DNE-V4-N2-97-104
[10] Berg TL, 2010, LECT NOTES COMPUT SC, V6311, P663, DOI 10.1007/978-3-642-15549-9_48

← 1 2 3 4 5 6 →