Implicit and Explicit Concept Relations in Deep Neural Networks for Multi-Label Video/Image Annotation

被引:44
作者
Markatopoulou, Foteini [1 ,2 ]
Mezaris, Vasileios [1 ]
Patras, Ioannis [2 ]
机构
[1] Ctr Res & Technol Hellas, Informat Technol Inst, Thermi 57001, Greece
[2] Queen Mary Univ London, Mile End Campus, London E1 4NS, England
基金
欧盟地平线“2020”;
关键词
Video/image concept annotation; deep learning; multi-task learning; structured outputs; multi-label learning; concept correlations; video analysis; IMAGE ANNOTATION; GRAPH;
D O I
10.1109/TCSVT.2018.2848458
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose a deep convolutional neural network (DCNN) architecture that addresses the problem of video/image concept annotation by exploiting concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn concept-specific representations that are sparse, linear combinations of representations of latent concepts. By enforcing the sharing of the latent concept representations, we exploit the implicit relations between the target concepts. At a second level, we build on ideas from structured output learning and propose the introduction, at training time, of a new cost term that explicitly models the correlations between the concepts. By doing so, we explicitly model the structure in the output space (i.e., the concept labels). Both of the above are implemented using standard convolutional layers and are incorporated in a single DCNN architecture that can then be trained end-to-end with standard back-propagation. Experiments on four large-scale video and image data sets show that the proposed DCNN improves concept annotation accuracy and outperforms the related state-of-the-art methods.
引用
收藏
页码:1631 / 1644
页数:14
相关论文
共 50 条
[21]   Multi-label learning with missing labels for image annotation and facial action unit recognition [J].
Wu, Baoyuan ;
Lyu, Siwei ;
Hu, Bao-Gang ;
Ji, Qiang .
PATTERN RECOGNITION, 2015, 48 (07) :2279-2289
[22]   SVM based multi-label learning with missing labels for image annotation [J].
Liu, Yang ;
Wen, Kaiwen ;
Gao, Quanxue ;
Gao, Xinbo ;
Nie, Feiping .
PATTERN RECOGNITION, 2018, 78 :307-317
[23]   Stable multi-label boosting for image annotation with structural feature selection [J].
ZHUANG YueTing .
ScienceChina(InformationSciences), 2011, 54 (12) :2528-2541
[24]   An efficient refinement algorithm for multi-label image annotation with correlation model [J].
Wang, Ling ;
Zhou, Tie Hua ;
Lee, Yang Koo ;
Cheoi, Kyung Joo ;
Ryu, Keun Ho .
TELECOMMUNICATION SYSTEMS, 2015, 60 (02) :285-301
[25]   Stable multi-label boosting for image annotation with structural feature selection [J].
YueTing Zhuang ;
YaHong Han ;
Fei Wu ;
JiaCheng Yang .
Science China Information Sciences, 2011, 54 :2508-2521
[26]   Stable multi-label boosting for image annotation with structural feature selection [J].
Zhuang YueTing ;
Han YaHong ;
Wu Fei ;
Yang JiaCheng .
SCIENCE CHINA-INFORMATION SCIENCES, 2011, 54 (12) :2508-2521
[27]   An efficient refinement algorithm for multi-label image annotation with correlation model [J].
Ling Wang ;
Tie Hua Zhou ;
Yang Koo Lee ;
Kyung Joo Cheoi ;
Keun Ho Ryu .
Telecommunication Systems, 2015, 60 :285-301
[28]   Combining local and global hypotheses in deep neural network for multi-label image classification [J].
Yu, Qinghua ;
Wang, Jinjun ;
Zhang, Shizhou ;
Gong, Yihong ;
Zhao, Jizhong .
NEUROCOMPUTING, 2017, 235 :38-45
[29]   Attend and Imagine: Multi-Label Image Classification With Visual Attention and Recurrent Neural Networks [J].
Lyu, Fan ;
Wu, Qi ;
Hu, Fuyuan ;
Wu, Qingyao ;
Tan, Mingkui .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (08) :1971-1981
[30]   Partially disentangled latent relations for multi-label deep learning [J].
Lian, Si-ming ;
Liu, Jian-wei ;
Lu, Run-kun ;
Luo, Xiong-lin .
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (11) :6039-6064