Visual-Semantic Alignment Across Domains Using a Semi-Supervised Approach

被引：2

作者：

Carraggi, Angelo ^{[1
]}

Cornia, Marcella ^{[1
]}

Baraldi, Lorenzo ^{[1
]}

Cucchiara, Rita ^{[1
]}

机构：

[1] Univ Modena & Reggio Emilia, Modena, Italy

来源：

COMPUTER VISION - ECCV 2018 WORKSHOPS, PT VI | 2019年 / 11134卷

关键词：

Multi-modal retrieval; Visual-semantic embeddings; Semi-supervised learning;

D O I：

10.1007/978-3-030-11024-6_47

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual-semantic embeddings have been extensively used as a powerful model for cross-modal retrieval of images and sentences. In this setting, data coming from different modalities can be projected in a common embedding space, in which distances can be used to infer the similarity between pairs of images and sentences. While this approach has shown impressive performances on fully supervised settings, its application to semi-supervised scenarios has been rarely investigated. In this paper we propose a domain adaptation model for cross-modal retrieval, in which the knowledge learned from a supervised dataset can be transferred on a target dataset in which the pairing between images and sentences is not known, or not useful for training due to the limited size of the set. Experiments are performed on two target unsupervised scenarios, respectively related to the fashion and cultural heritage domain. Results show that our model is able to effectively transfer the knowledge learned on ordinary visual-semantic datasets, achieving promising results. As an additional contribution, we collect and release the dataset used for the cultural heritage domain.

引用

页码：625 / 640

页数：16

共 31 条

[1]

[Anonymous], 2015, Arxiv.Org, DOI DOI 10.3389/FPSYG.2013.00124

[2]

Baraldi L, 2018, INT C PATT RECOG, P1097, DOI 10.1109/ICPR.2018.8545064

[3]

Bojanowski Piotr, 2017, Trans. Assoc. Comput. Linguist., V5, P135, DOI DOI 10.1162/TACL_A_00051

[4] Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner [J].

Chen, Tseng-Hung ;

Liao, Yuan-Hong ;

Chuang, Ching-Yao ;

Hsu, Wan-Ting ;

Fu, Jianlong ;

Sun, Min .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :521-530

[5]

Cornia M, 2017, IEEE INT CONF MULTI

[6] Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention [J].

Cornia, Marcella ;

Baraldi, Lorenzo ;

Serra, Giuseppe ;

Cucchiara, Rita .

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (02)

[7] Predicting Visual Features From Text for Image and Video Caption Retrieval [J].

Dong, Jianfeng ;

Li, Xirong ;

Snoek, Cees G. M. .

IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (12) :3377-3388

[8]

Eisenschtat A., 2017, IEEE INT C COMP VIS

[9]

Faghri Fartash, 2017, arXiv

[10]

Hoffman J., 2017, ARXIV171103213

← 1 2 3 4 →