Cross-media Relevance Computation for Multimedia Retrieval

被引：1

作者：

Dong, Jianfeng ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17) | 2017年

关键词：

Cross-media retrieval; Image retrieval by textual queries; Sentence retrieval by visual queries;

D O I：

10.1145/3123266.3123963

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we summarize our works for cross-media retrieval where the queries and retrieval content are of different media types. We study cross-media retrieval in the context of two applications, i.e., image retrieval by textual queries, and sentence retrieval by visual queries, two popular applications in multimedia retrieval. For image retrieval by textual queries, we propose text2image which converts computing cross-media relevance between images and textual queries to comparing the visual similarity among images. We also propose cross-media relevance fusion, a conceptual framework that combines multiple cross-media relevance estimators. These two techniques have resulted in a winning entry in the Microsoft Image Retrieval Challenge at ACM MM 2015. For sentence retrieval by visual queries, we propose to compute cross-media relevance in a visual space exclusively. We contribute Word2VisualVec, a deep neural network architecture that learns to predict a visual feature representation from textual input. With proposed Word2VisualVec model, we won the Video to Text Description task at TRECVID 2016.

引用

页码：831 / 835

页数：5