Text-Embedded Bilinear Model for Fine-Grained Visual Recognition

被引：6

作者：

Sun, Liang ^{[1
,2
]}

Guan, Xiang ^{[1
,2
]}

Yang, Yang ^{[3
]}

Zhang, Lei ^{[4
]}

机构：

[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China

[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China

[3] Univ Elect Sci & Technol China, Inst Elect & Informat Engn UESTC Guangdong, Chengdu, Peoples R China

[4] Chongqing Univ, Chongqing, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

fine-grained visual recognition; multi-modal analysis; cross-layer bilinear network; deep learning;

D O I：

10.1145/3394171.3413638

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fine-grained visual recognition, which aims to identify subcategories of the same base-level category, is a challenging task because of its large intra-class variances and small inter-class variances. Human beings can perform object recognition task based on not only the visual appearance but also the knowledge from texts, as texts can point out the discriminative parts or characteristics which are always the key to distinguishing different subcategories. This is an involuntary transfer from human textual attention to visual attention, suggesting that texts are able to assist fine-grained recognition. In this paper, we propose a Text-Embedded Bilinear (TEB) model which incorporates texts as extra guidance for fine-grained recognition. Specially, we first conduct a text-embedded network to embed text feature into the discriminative image feature learning to get a embedded feature. In addition, since the cross-layer part feature interaction and fine-grained feature learning are mutually correlated and can reinforce each other, we also extract a candidate feature from the text encoder and embed it into the inter-layer feature of the image encoder to get an embedded candidate feature. At last we utilize a cross-layer bilinear network to fuse the two embedded features. Comparing with state-of-the-art methods on the widely used CUB-200-2011 dataset and Oxford Flowers-102 dataset for fine-grained image recognition, the experimental results demonstrate our TEB model achieves the best performance.

引用

页码：211 / 219

页数：9

共 56 条

[1]

[Anonymous], 2014, BMVC

[2]

Azizpour Hossein, 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), P36, DOI 10.1109/CVPRW.2015.7301270

[3] The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification [J].

Chang, Dongliang ;

Ding, Yifeng ;

Xie, Jiyang ;

Bhunia, Ayan Kumar ;

Li, Xiaoxu ;

Ma, Zhanyu ;

Wu, Ming ;

Guo, Jun ;

Song, Yi-Zhe .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :4683-4695

[4]

Chen TS, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P627

[5]

Chen TS, 2018, AAAI CONF ARTIF INTE, P6730

[6] Destruction and Construction Learning for Fine-grained Image Recognition [J].

Chen, Yue ;

Bai, Yalong ;

Zhang, Wei ;

Mei, Tao .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5152-5161

[7] Fine-grained Categorization and Dataset Bootstrapping using Deep Metric Learning with Humans in the Loop [J].

Cui, Yin ;

Zhou, Feng ;

Lin, Yuanqing ;

Belongie, Serge .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1153-1162

[8] Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition [J].

Fu, Jianlong ;

Zheng, Heliang ;

Mei, Tao .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4476-4484

[9] Compact Bilinear Pooling [J].

Gao, Yang ;

Beijbom, Oscar ;

Zhang, Ning ;

Darrell, Trevor .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :317-326

[10] Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up [J].

Ge, Weifeng ;

Lin, Xiangru ;

Yu, Yizhou .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3029-3038

← 1 2 3 4 5 6 →