Leveraging multi-modal fusion for graph-based image annotation

被引：5

作者：

Amiri, S. Hamid ^{[1
]}

Jamzad, Mansour ^{[2
]}

机构：

[1] Shahid Rajaee Teacher Training Univ, Dept Comp Engn, Tehran, Iran

[2] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2018年 / 55卷

关键词：

Image annotation; Tag; Manifold; Multi-modal representation; Graph-based learning; Supergraph;

D O I：

10.1016/j.jvcir.2018.08.012

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Considering each of the visual features as one modality in image annotation task, efficient fusion of different modalities is essential in graph-based learning. Traditional graph-based methods consider one node for each image and combine its visual features into a single descriptor before constructing the graph. In this paper, we propose an approach that constructs a subgraph for each modality in such a way that edges of subgraph are determined using a search-based approach that handles class-imbalance challenge in the annotation datasets. Multiple subgraphs are then connected to each other to have a supergraph. This follows by introducing a learning framework to infer the tags of unannotated images on the supergraph. The proposed approach takes advantages of graph-based semi-supervised learning and multi-modal representation simultaneously. We evaluate the performance of the proposed approach on different datasets. The results reveal that the proposed approach improves the accuracy of annotation systems. (C) 2018 Elsevier Inc. All rights reserved.

引用

页码：816 / 828

页数：13

共 40 条

[1] Efficient multi-modal fusion on supergraph for scalable image annotation [J].

Amiri, S. Hamid ;

Jarnzad, Mansour .

PATTERN RECOGNITION, 2015, 48 (07) :2241-2253

[2]

[Anonymous], 2003, P 20 INT C MACH LEAR

[3]

[Anonymous], 2011, ACM T INTEL SYST TEC, DOI [10.1145/1899412.1899418, DOI 10.1145/1899412.1899418]

[4]

[Anonymous], 2014, ARXIV PREPRINT ARXIV

[5]

[Anonymous], 2006, INT WORKSH ONTOIMAGE

[6] Multimodal fusion for multimedia analysis: a survey [J].

Atrey, Pradeep K. ;

Hossain, M. Anwar ;

El Saddik, Abdulmotaleb ;

Kankanhalli, Mohan S. .

MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379

[7]

Ballan L., 2014, P ACM INT C MULTIMED

[8]

Belkin M., 2002, NIPS

[9]

Belkin M, 2006, J MACH LEARN RES, V7, P2399

[10] From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images [J].

Bruckstein, Alfred M. ;

Donoho, David L. ;

Elad, Michael .

SIAM REVIEW, 2009, 51 (01) :34-81

← 1 2 3 4 →