A Structured Listwise Approach to Learning to Rank for Image Tagging

被引:0
作者
Sanchez, Jorge [1 ,2 ]
Luque, Franco [1 ,2 ]
Lichtensztein, Leandro [3 ]
机构
[1] Consejo Nacl Invest Cient & Tecn, Cordoba, Argentina
[2] Univ Nacl Cordoba, Cordoba, Argentina
[3] Deep Vis AI Inc, Cordoba, Argentina
来源
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT VI | 2019年 / 11134卷
关键词
Learning to rank; Zero-shot learning; Image tagging; Visual-semantic compatibility; Multimodal embedding; RELEVANCE;
D O I
10.1007/978-3-030-11024-6_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the growing quantity and diversity of publicly available image data, computer vision plays a crucial role in understanding and organizing visual information today. Image tagging models are very often used to make this data accessible and useful. Generating image labels and ranking them by their relevance to the visual content is still an open problem. In this work, we use a bilinear compatibility function inspired from zero-shot learning that allows us to rank tags according to their relevance to the image content. We propose a novel listwise structured loss formulation to learn it from data. We leverage captioned image data and propose different "tags from captions" schemes meant to capture user attention and intra-user agreement in a simple and effective manner. We evaluate our method on the COCO-Captions, PASCAL-sentences and MIRFlickr-25k datasets showing promising results.
引用
收藏
页码:545 / 559
页数:15
相关论文
共 35 条
[1]   Label-Embedding for Image Classification [J].
Akata, Zeynep ;
Perronnin, Florent ;
Harchaoui, Zaid ;
Schmid, Cordelia .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (07) :1425-1438
[2]  
Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911
[3]  
Bojanowski Piotr, 2017, Trans. Assoc. Comput. Linguist., V5, P135, DOI DOI 10.1162/TACL_A_00051
[4]   An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild [J].
Chao, Wei-Lun ;
Changpinyo, Soravit ;
Gong, Boqing ;
Sha, Fei .
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :52-68
[5]  
Chua T.-S., 2009, P ACM INT C IM VID R
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[8]   Recent Advances in Zero-Shot Recognition Toward data-efficient understanding of visual content [J].
Fu, Yanwei ;
Xiang, Tao ;
Jiang, Yu-Gang ;
Xue, Xiangyang ;
Sigal, Leonid ;
Gong, Shaogang .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :112-125
[9]   Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search [J].
Gao, Yue ;
Wang, Meng ;
Zha, Zheng-Jun ;
Shen, Jialie ;
Li, Xuelong ;
Wu, Xindong .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (01) :363-376
[10]   Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels [J].
Han, Bo ;
Yao, Quanming ;
Yu, Xingrui ;
Niu, Gang ;
Xu, Miao ;
Hu, Weihua ;
Tsang, Ivor W. ;
Sugiyama, Masashi .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31