Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach

被引:2
作者
Carraggi, Angelo [1 ]
Cornia, Marcella [1 ]
Baraldi, Lorenzo [1 ]
Cucchiara, Rita [1 ]
机构
[1] Univ Modena & Reggio Emilia, Modena, Italy
来源
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT VI | 2019年 / 11134卷
关键词
Multi-modal retrieval; Visual-semantic embeddings; Semi-supervised learning;
D O I
10.1007/978-3-030-11024-6_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, data coming from different modalities can be projected in a common embedding space, in which distances can be used to infer the similarity between pairs of images and sentences. While this approach has shown impressive performances on fully supervised settings, its application to semi-supervised scenarios has been rarely investigated. In this paper we propose a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. Experiments are performed on two target unsupervised scenarios, respectively related to the fashion and cultural heritage domain. Results show that our model is able to effectively transfer the knowledge learned on ordinary visual-semantic datasets, achieving promising results. As an additional contribution, we collect and release the dataset used for the cultural heritage domain.
引用
收藏
页码:625 / 640
页数:16
相关论文
共 31 条
  • [11] Simultaneous Super-Resolution and Cross-Modality Synthesis of 3D Medical Images using Weakly-Supervised Joint Convolutional Sparse Coding
    Huang, Yawen
    Shao, Ling
    Frangi, Alejandro F.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5787 - 5796
  • [12] Inoue N., 2018, IEEE INT C COMP VIS
  • [13] Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
  • [14] Kiros R, 2014, PR MACH LEARN RES, V32, P595
  • [15] Microsoft COCO: Common Objects in Context
    Lin, Tsung-Yi
    Maire, Michael
    Belongie, Serge
    Hays, James
    Perona, Pietro
    Ramanan, Deva
    Dollar, Piotr
    Zitnick, C. Lawrence
    [J]. COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 740 - 755
  • [16] DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations
    Liu, Ziwei
    Luo, Ping
    Qiu, Shi
    Wang, Xiaogang
    Tang, Xiaoou
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1096 - 1104
  • [17] Long MS, 2017, PR MACH LEARN RES, V70
  • [18] Mikolov T., 2013, ADV NEURAL INFORM PR, V26, P3111
  • [19] Nam H., 2017, COMPUTER VISION PATT
  • [20] BLEU: a method for automatic evaluation of machine translation
    Papineni, K
    Roukos, S
    Ward, T
    Zhu, WJ
    [J]. 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 311 - 318