Learning of Multimodal Representations With Random Walks on the Click Graph

被引：34

作者：

Wu, Fei ^{[1
]}

Lu, Xinyan ^{[1
]}

Song, Jun ^{[1
]}

Yan, Shuicheng ^{[2
]}

Zhang, Zhongfei ^{[3
]}

Rui, Yong ^{[4
]}

Zhuang, Yueting ^{[1
]}

机构：

[1] Zhejiang Univ, Sch Comp Sci & Technol, Hangzhou 310027, Peoples R China

[2] Natl Univ Singapore, Dept Elect Engn, Singapore 119077, Singapore

[3] Zhejiang Univ, Dept Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China

[4] Microsoft Res Asia, Beijing 100080, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2016年 / 25卷 / 02期

关键词：

Cross-media search; click log; latent representation; deep learning; RANK;

D O I：

10.1109/TIP.2015.2507401

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. With the click data collected from the users' searching behavior, existing approaches take either one-to-one paired data (text-image pairs) or ranking examples (text-query-image and/or image-query-text ranking lists) as training examples, which do not make full use of the click data, particularly the implicit connections among the data objects. In this paper, we treat the click data as a large click graph, in which vertices are images/text queries and edges indicate the clicks between an image and a query. We consider learning a multimodal representation from the perspective of encoding the explicit/implicit relevance relationship between the vertices in the click graph. By minimizing both the truncated random walk loss as well as the distance between the learned representation of vertices and their corresponding deep neural network output, the proposed model which is named multimodal random walk neural network (MRW-NN) can be applied to not only learn robust representation of the existing multimodal data in the click graph, but also deal with the unseen queries and images to support cross-modal retrieval. We evaluate the latent representation learned by MRW-NN on a public large-scale click log data set Clickture and further show that MRW-NN achieves much better cross-modal retrieval performance on the unseen queries/images than the other state-of-the-art methods.

引用

页码：630 / 642

页数：13

共 39 条

[1]

Andrienko G., 2013, Introduction, P1

[2]

[Anonymous], 2010, P 18 ACM INT C MULT

[3]

[Anonymous], 2013, P ACM INT C MULTIMED

[4]

[Anonymous], 2009, Advances in neural information processing systems

[5]

[Anonymous], 2013, P 21 ACM INT C MULTI

[6]

[Anonymous], 2006, PATTERN RECOGN

[7]

[Anonymous], 2014, PROC 20 ACM SIGKDD, DOI DOI 10.1145/2623330.2623732

[8]

Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463

[9]

Bai B., 2009, NIPS, P64

[10] Bag-of-Words Based Deep Neural Network for Image Retrieval [J].

Bai, Yalong ;

Yu, Wei ;

Xiao, Tianjun ;

Xu, Chang ;

Yang, Kuiyuan ;

Ma, Wei-Ying ;

Zhao, Tiejun .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :229-232

← 1 2 3 4 →