A Hybrid Approach for Semantic Image Annotation

被引:1
作者
Sezen, Arda [1 ]
Turhan, Cigdem [2 ]
Sengul, Gokhan [3 ]
机构
[1] OSTIM Tech Univ, Dept Software Engn, TR-06374 Ankara, Turkey
[2] Atilim Univ, Dept Software Engn, TR-06830 Ankara, Turkey
[3] Atilim Univ, Dept Comp Engn, TR-06830 Ankara, Turkey
关键词
Annotations; Ontologies; Sports; Image annotation; Semantics; Training; Computational modeling; Semantic image annotation; picture interpretation; ontology; BIG DATA; SEGMENTATION; RETRIEVAL; MACHINES;
D O I
10.1109/ACCESS.2021.3114968
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, a framework that generates natural language descriptions of images within a controlled environment is proposed. Previous work on neural networks mostly focused on choosing the right labels and/or increasing the number of related labels to depict an image. However, creating a textual description of an image is a completely different phenomenon, structurally, syntactically, and semantically. The proposed semantic image annotation framework presents a novel combination of deep learning models and aligned annotation results derived from the instances of the ontology classes to generate sentential descriptions of images. Our hybrid approach benefits from the unique combination of deep learning and semantic web technologies. We detect objects from unlabeled sports images using a deep learning model based on a residual network and a feature pyramid network, with the focal loss technique to obtain predictions with high probability. The proposed framework not only produces probabilistically labeled images, but also the contextual results obtained from a knowledge base exploiting the relationship between the objects. The framework's object detection and prediction performances are tested with two datasets where the first one includes individual instances of images containing everyday scenes of common objects and the second custom dataset contains sports images collected from the web. Moreover, a sample image set is created to obtain annotation result data by applying all framework layers. Experimental results show that the framework is effective in this controlled environment and can be used with other applications via web services within the supported sports domain.
引用
收藏
页码:131977 / 131994
页数:18
相关论文
共 47 条
[1]   Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN plus [J].
Acuna, David ;
Ling, Huan ;
Kar, Amlan ;
Fidler, Sanja .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :859-868
[2]   High-Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications [J].
Akusok, Anton ;
Bjork, Kaj-Mikael ;
Miche, Yoan ;
Lendasse, Amaury .
IEEE ACCESS, 2015, 3 :1011-1025
[3]   Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions [J].
Baier, Stephan ;
Ma, Yunpu ;
Tresp, Volker .
SEMANTIC WEB - ISWC 2017, PT I, 2017, 10587 :53-68
[4]   Building and using fuzzy multimedia ontologies for semantic image annotation [J].
Bannour, Hichem ;
Hudelot, Celine .
MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 72 (03) :2107-2141
[5]   Annotating Object Instances with a Polygon-RNN [J].
Castrejon, Lluis ;
Kundu, Kaustav ;
Urtasun, Raquel ;
Fidler, Sanja .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4485-4493
[6]   Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision [J].
Chen, Liang-Chieh ;
Fidler, Sanja ;
Yuille, Alan L. ;
Urtasun, Raquel .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3198-3205
[7]   Every Picture Tells a Story: Generating Sentences from Images [J].
Farhadi, Ali ;
Hejrati, Mohsen ;
Sadeghi, Mohammad Amin ;
Young, Peter ;
Rashtchian, Cyrus ;
Hockenmaier, Julia ;
Forsyth, David .
COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :15-+
[8]   Comparing machines and humans on a visual categorization test [J].
Fleuret, Francois ;
Li, Ting ;
Dubout, Charles ;
Wampler, Emma K. ;
Yantis, Steven ;
Geman, Donald .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (43) :17621-17625
[9]  
Franzoni V, 2015, 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), P1280, DOI 10.1109/FSKD.2015.7382127
[10]   A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics [J].
Gong, Yunchao ;
Ke, Qifa ;
Isard, Michael ;
Lazebnik, Svetlana .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 106 (02) :210-233