I read, I saw, I tell: Texts Assisted Fine-Grained Visual Classification

被引:23
作者
Li, Jingjing [1 ]
Zhu, Lei [2 ]
Huang, Zi [3 ]
Lu, Ke [1 ]
Zhao, Jidong [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Shandong Normal Univ, Jinan, Peoples R China
[3] Univ Queensland, Sch ITEE, Brisbane, Qld, Australia
来源
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Fine-grained visual classification; multi-modal analysis; deep learning; transfer learning;
D O I
10.1145/3240508.3240579
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In visual classification tasks, it is hard to tell the subtle differences from one species to another similar breeds. Such a challenging problem is generally known as Fine-Grained Visual Classification (FGVC). In this paper, we propose a novel FGVC approach called Texts Assisted Fine-Grained Visual Classification (TA-FGVC). TA-FGVC reads from texts to gain attention, sees the images with the gained attention and then tells the subtle differences. Technically, we propose a deep neural network which learns a visual-semantic embedding model. The proposed deep architecture mainly consists of two parts: visual localization and visual-to-semantic projection. The model is fed with both visual features which are extracted from raw images and semantic information which are learned from two sources: gleaned from unannotated texts and gathered from image attributes. At the very last layer of the model, each image is embedded into the semantic space which is related to class labels. Finally, the categorization results from both visual stream and visual-semantic stream are combined to achieve the ultimate decision. Extensive experiments on open standard benchmarks verify the superiority of our model against several state of the art work.
引用
收藏
页码:663 / 671
页数:9
相关论文
共 52 条
[31]  
Li X., 2016, P IJCAI, P3411
[32]   Accurate object detection using memory-based models in surveillance scenes [J].
Li, Xudong ;
Ye, Mao ;
Liu, Yiguang ;
Zhang, Feng ;
Liu, Dan ;
Tang, Song .
PATTERN RECOGNITION, 2017, 67 :73-84
[33]  
Lin D, 2015, PROC CVPR IEEE, P1666, DOI 10.1109/CVPR.2015.7298775
[34]  
Lin YK, 2015, AAAI CONF ARTIF INTE, P2181
[35]  
Mikolov T., 2013, P 26 INT C NEURAL IN, P3111
[36]   DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks [J].
Ouyang, Wanli ;
Zeng, Xingyu ;
Wang, Xiaogang ;
Qiu, Shi ;
Luo, Ping ;
Tian, Yonglong ;
Li, Hongsheng ;
Yang, Shuo ;
Wang, Zhe ;
Li, Hongyang ;
Wang, Kun ;
Yan, Junjie ;
Loy, Chen-Change ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (07) :1320-1334
[37]  
Qian Q, 2015, PROC CVPR IEEE, P3716, DOI 10.1109/CVPR.2015.7298995
[38]   An improved genetic algorithm with co-evolutionary strategy for global path planning of multiple mobile robots [J].
Qu, Hong ;
Xing, Ke ;
Alexander, Takacs .
NEUROCOMPUTING, 2013, 120 :509-517
[39]   Real-Time Robot Path Planning Based on a Modified Pulse-Coupled Neural Network Model [J].
Qu, Hong ;
Yang, Simon X. ;
Willms, Allan R. ;
Yi, Zhang .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (11) :1724-1739
[40]   Learning Deep Representations of Fine-Grained Visual Descriptions [J].
Reed, Scott ;
Akata, Zeynep ;
Lee, Honglak ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :49-58