Cross-media retrieval method based on content correlation

被引:0
|
作者
Zhang, Hong [1 ,2 ]
Wu, Fei [2 ]
Zhuang, Yue-Ting [2 ]
Chen, Jian-Xun [1 ]
机构
[1] College of Computer Science and Technology, Wuhan University of Science and Technology
[2] Institute of Artificial Intelligence, Zhejiang University
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2008年 / 31卷 / 05期
关键词
Canonical correlation; Cross-media retrieval; Heterogeneity; Relevance feedback; Subspace mapping;
D O I
10.3724/sp.j.1016.2008.00820
中图分类号
学科分类号
摘要
Most traditional content-based multimedia retrieval methods are designed for multimedia data of single modality. Such methods include image retrieval, audio retrieval, video retrieval, etc. This paper proposes a novel cross-media retrieval approach, which can process multimedia data of different modalities and measure cross-media similarity, such as image-audio similarity. First statistical method is used to learn canonical correlation between low-level feature spaces of different modalities. Then, sub-space mapping is designed to build an isomorphic subspace and solve the heterogeneity problem between different low-level feature vectors. This subspace contains media objects of different modalities, and each media object is represented with isomorphic vector. Since canonical correlation among multimedia objects are furthest preserved during the mapping process, cross-media similarity can be estimated with defined distance metric. Furthermore, relevance feedback provided by users is utilized to learn prior knowledge and refine multimedia topology in the subspace. In this way cross-media similarity is more consistent with human perception with the incorporation of user interaction. Both image and audio data are selected for experiments and comparisons. Given the same visual and auditory features the new approach outperforms ICA, PCA and PLS methods both in precision and recall performance. Overall cross-media retrieval results between images and audios are very encouraging.
引用
收藏
页码:820 / 826
页数:6
相关论文
共 13 条
  • [1] Zhang H.-J., Zhong D., Schema for visual feature-based image indexing, Proceedings of the SPIE, Storage and Retrieval for Image and Video Database, pp. 36-46, (1995)
  • [2] David R.H., John S.T., KCCA for different level precision in content-based image retrieval, Proceedings of the 3rd International Workshop on Content-Based Multimedia Indexing, pp. 51-56, (2003)
  • [3] Snoek C.G.M., Worring M., Geusebroek J.M., Semantic video search engine, Proceedings of the TRECVID Workshop, pp. 102-105, (2004)
  • [4] Zhao X.-Y., Zhuang Y.-T., Wu F., Audio clip retrieval with fast relevance feedback based on constrained fuzzy clustering and stored Index table, Proceedings of the Pacific-Rim Conference on Multimedia, pp. 237-244, (2002)
  • [5] McGurk J.M., Hearing lips and seeing voices, Nature, 264, 5588, pp. 746-748, (1976)
  • [6] Hardoon D.R., A correlation approach for automatic image annotation, Proceedings of the 2nd International Conference on Advanced Data Mining and Applications, pp. 681-692, (2006)
  • [7] Wang X.-J., Ma W.-Y., Xue G.-R., Li X., Multi-model similarity propagation and its application for web image retrieval, Proceedings of the ACM Multimedia Conference, pp. 944-951, (2004)
  • [8] Ma Q., Nadamoto A., Tanaka K., Complementary information retrieval for cross-media news content, Proceedings of Information Systems, 31, 7, pp. 659-678, (2006)
  • [9] Joliffe I.T., Principal Component Analysis, pp. 74-81, (1986)
  • [10] Hansen L.K., Larsen J., Kolenda T., On independent component analysis for multimedia signals, Multimedia Image and Video Processing, pp. 175-200, (2000)