Automatic image captioning combining natural language processing and deep neural networks

被引:11
作者
Rinaldi, Antonio M. [1 ]
Russo, Cristiano [1 ]
Tommasino, Cristian [1 ]
机构
[1] Univ Naples Federico II, Dept Elect Engn & Informat Technol, IKNOS LAB Intelligent & Knowledge Syst LUPT, Via Claudio 21, I-80125 Naples, Italy
关键词
Object detection; Image captioning; Deep neural networks; Semantic-instance segmentation;
D O I
10.1016/j.rineng.2023.101107
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
An image contains a lot of information that humans can detect in a very short time. Image captioning aims to detect this information by describing the image content through image and text processing techniques. One of the peculiarities of the proposed approach is the combination of multiple networks to catch as many distinct features as possible from a semantic point of view. In this work, our goal is to prove that a combination strategy of existing methods can efficiently improve the performance in the object detection tasks concerning the performance achieved by each tested individually. This approach involves using different deep neural networks that perform two levels of hierarchical object detection in an image. The results are combined and used by a captioning module that generates image captions through natural language processing techniques. Several experimental results are reported and discussed to show the effectiveness of our framework. The combination strategy has also improved, showing a gain in precision over single models.
引用
收藏
页数:14
相关论文
共 50 条
[31]   A Portable, Automatic Data Quantizer for Deep Neural Networks [J].
Oh, Young H. ;
Quan, Quan ;
Kim, Daeyeon ;
Kim, Seonghak ;
Heo, Jun ;
Jung, Sungjun ;
Jang, Jaeyoung ;
Lee, Jae W. .
27TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2018), 2018,
[32]   From Image Captioning to Video Summary using Deep Recurrent Networks and Unsupervised Segmentation [J].
Morosanu, Bogdan-Andrei ;
Lemnaru, Camelia .
TENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2017), 2018, 10696
[33]   Image Captioning for Spatially Rotated Images in Video Surveillance Applications Using Neural Networks [J].
Nivedita, M. ;
Phamila, Asnath Victy Y. .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2021, 29 (SUPPL 2) :193-209
[34]   Combining and Merging Deep Neural Networks for Arabic Text Categorization [J].
El-Alami, Fatima-Zahra ;
El Alaoui, Said Ouatik .
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 :338-347
[35]   Deep neural networks for emotion recognition combining audio and transcripts [J].
Cho, Jaejin ;
Pappagari, Raghavendra ;
Kulkarni, Purva ;
Villalba, Jesus ;
Carmiel, Yishay ;
Dehak, Najim .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :247-251
[36]   Kazakh Language Open Vocabulary Language Model with Deep Neural Networks [J].
Sultanova, Nazerke ;
Kessikbayeva, Gulshat ;
Amangeldi, Yerbolat .
2019 15TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTER AND COMPUTATION (ICECCO), 2019,
[37]   DEEP QUATERNION NEURAL NETWORKS FOR SPOKEN LANGUAGE UNDERSTANDING [J].
Parcollet, Titouan ;
Morchid, Mohamed ;
Linares, Georges .
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, :504-511
[38]   Deep Neural Network for Automatic Image Recognition of Engineering Diagrams [J].
Yun, Dong-Yeol ;
Seo, Seung-Kwon ;
Zahid, Umer ;
Lee, Chul-Jin .
APPLIED SCIENCES-BASEL, 2020, 10 (11)
[39]   Efficient Processing of Deep Neural Networks: A Tutorial and Survey [J].
Sze, Vivienne ;
Chen, Yu-Hsin ;
Yang, Tien-Ju ;
Emer, Joel S. .
PROCEEDINGS OF THE IEEE, 2017, 105 (12) :2295-2329
[40]   Sensory processing and categorization in cortical and deep neural networks [J].
Pinotsis, Dimitris A. ;
Siegel, Markus ;
Miller, Earl K. .
NEUROIMAGE, 2019, 202