Understanding multimedia document semantics for cross-media retrieval

被引：0

作者：

Wu, F ^{[1
]}

Yang, Y ^{[1
]}

Zhuang, YT ^{[1
]}

Pan, YH ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Engn, Hangzhou 310027, Peoples R China

来源：

ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2005, PT 1 | 2005年 / 3767卷

关键词：

cross-media retrieval; multimedia document; manifold; CLASSIFICATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimedia Document (MMD) such as Web Page and Multimedia cyclopedias is composed of media objects of different modalities, and its integrated semantics is always expressed by the combination of all media objects in it. Since the contents in MMDs are enormous and the amount of them is increasing rapidly, effective management of MMDs is in great demand. Meanwhile, it is meaningful to provide users cross-media retrieval facilities so that users can query media objects by examples of different modalities, e.g. users may query an MMD (or an image) by submitting a audio clip and vice versa. However, there exist two challenges to achieve the above goals. First, how can we represent an MMD and fuse media objects together to achieve Cross-index and facilitate Cross-media retrieval? Second, how can we understand MMD semantics? Taking into account of the two problems, we give the definition of MMD and propose a manifold learning method to discover MMD semantics in this paper. We first construct an MMD semi-semantic graph (SSG) and then adopt Multidimensional scaling to create an MMD semantic space (MMDSS). We also propose two periods' feedbacks. The first one is used to refine SSG and the second one is adopted to introduce new MMD that is not in the MMDSS into MMDSS. Since all of the MMDs and their component media objects of different modalities lie in MMDSS, cross-media retrieval can be easily performed. Experiment results are encouraging and indicate that the performance of the proposed approach is effective.

引用

页码：993 / 1004

页数：12

共 18 条

[1]

[Anonymous], INT J DIGITAL LIB

[2]

Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217

[3]

CHANG E, 2003, IEEE T CIRCUITS SYST, V13

[4] ClassView:: Hierarchical video shot classification, indexing, and accessing [J].

Fan, JP ;

Elmagarmid, AK ;

Zhu, XQ ;

Aref, WG ;

Wu, LD .

IEEE TRANSACTIONS ON MULTIMEDIA, 2004, 6 (01) :70-86

[5] Content-based audio classification and retrieval by support vector machines [J].

Guo, GD ;

Li, SZ .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2003, 14 (01) :209-215

[6]

HE X, 2004, ACM MULT C NEW YORK

[7]

Kruskal J., 1977, Multidimensional Scaling

[8]

MADDAGE NC, 2004, ACM MULT C NEW YORK

[9]

MULLER M, P ACM SIGGRAPH 2005

[10] Cognition - The manifold ways of perception [J].

Seung, HS ;

Lee, DD .

SCIENCE, 2000, 290 (5500) :2268-+

← 1 2 →