Cross-media retrieval method based on content correlation

被引：0

作者：

Zhang, Hong ^{[1
,2
]}

Wu, Fei ^{[2
]}

Zhuang, Yue-Ting ^{[2
]}

Chen, Jian-Xun ^{[1
]}

机构：

[1] College of Computer Science and Technology, Wuhan University of Science and Technology

[2] Institute of Artificial Intelligence, Zhejiang University

来源：

Jisuanji Xuebao/Chinese Journal of Computers | 2008年 / 31卷 / 05期

关键词：

Canonical correlation; Cross-media retrieval; Heterogeneity; Relevance feedback; Subspace mapping;

D O I：

10.3724/sp.j.1016.2008.00820

中图分类号：

学科分类号：

摘要：

Most traditional content-based multimedia retrieval methods are designed for multimedia data of single modality. Such methods include image retrieval, audio retrieval, video retrieval, etc. This paper proposes a novel cross-media retrieval approach, which can process multimedia data of different modalities and measure cross-media similarity, such as image-audio similarity. First statistical method is used to learn canonical correlation between low-level feature spaces of different modalities. Then, sub-space mapping is designed to build an isomorphic subspace and solve the heterogeneity problem between different low-level feature vectors. This subspace contains media objects of different modalities, and each media object is represented with isomorphic vector. Since canonical correlation among multimedia objects are furthest preserved during the mapping process, cross-media similarity can be estimated with defined distance metric. Furthermore, relevance feedback provided by users is utilized to learn prior knowledge and refine multimedia topology in the subspace. In this way cross-media similarity is more consistent with human perception with the incorporation of user interaction. Both image and audio data are selected for experiments and comparisons. Given the same visual and auditory features the new approach outperforms ICA, PCA and PLS methods both in precision and recall performance. Overall cross-media retrieval results between images and audios are very encouraging.

引用

页码：820 / 826

页数：6

共 13 条

[1] Zhang H.-J., Zhong D., Schema for visual feature-based image indexing, Proceedings of the SPIE, Storage and Retrieval for Image and Video Database, pp. 36-46, (1995)
[2] David R.H., John S.T., KCCA for different level precision in content-based image retrieval, Proceedings of the 3rd International Workshop on Content-Based Multimedia Indexing, pp. 51-56, (2003)
[3] Snoek C.G.M., Worring M., Geusebroek J.M., Semantic video search engine, Proceedings of the TRECVID Workshop, pp. 102-105, (2004)
[4] Zhao X.-Y., Zhuang Y.-T., Wu F., Audio clip retrieval with fast relevance feedback based on constrained fuzzy clustering and stored Index table, Proceedings of the Pacific-Rim Conference on Multimedia, pp. 237-244, (2002)
[5] McGurk J.M., Hearing lips and seeing voices, Nature, 264, 5588, pp. 746-748, (1976)
[6] Hardoon D.R., A correlation approach for automatic image annotation, Proceedings of the 2nd International Conference on Advanced Data Mining and Applications, pp. 681-692, (2006)
[7] Wang X.-J., Ma W.-Y., Xue G.-R., Li X., Multi-model similarity propagation and its application for web image retrieval, Proceedings of the ACM Multimedia Conference, pp. 944-951, (2004)
[8] Ma Q., Nadamoto A., Tanaka K., Complementary information retrieval for cross-media news content, Proceedings of Information Systems, 31, 7, pp. 659-678, (2006)
[9] Joliffe I.T., Principal Component Analysis, pp. 74-81, (1986)
[10] Hansen L.K., Larsen J., Kolenda T., On independent component analysis for multimedia signals, Multimedia Image and Video Processing, pp. 175-200, (2000)

← 1 2 →