Sparse Multi-Modal Topical Coding for Image Annotation

被引：11

作者：

Song, Lingyun ^{[1
]}

Luo, Minnan ^{[1
]}

Liu, Jun ^{[1
]}

Zhang, Lingling ^{[1
]}

Qian, Buyue ^{[1
]}

Li, Max Haifei ^{[2
]}

Zheng, Qinghua ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, SPKLSTN Lab, Xian 710049, Peoples R China

[2] Union Univ, Dept Comp Sci, Jackson, TN 38305 USA

来源：

NEUROCOMPUTING | 2016年 / 214卷

基金：

美国国家科学基金会;

关键词：

Topic models; Sparse latent representation; Image annotation; Image retrieval; REGULARIZATION; REPRESENTATION; COMPLETION;

D O I：

10.1016/j.neucom.2016.06.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image annotation plays a significant role in large scale image understanding, indexing and retrieval. The Probability Topic Models (PTMs) attempt to address this issue by learning latent representations of input samples, and have been shown to be effective by existing studies. Though useful, PTM has some limitations in interpreting the latent representations of images and texts, which if addressed would broaden its applicability. In this paper, we introduce sparsity to PTM to improve the interpretability of the inferred latent representations. Extending the Sparse Topical Coding that originally designed for unimodal documents learning, we propose a non-probabilistic formulation of PTM for automatic image annotation, namely Sparse Multi-Modal Topical Coding. Beyond controlling the sparsity, our model can capture more compact correlations between words and image regions. Empirical results on some benchmark datasets show that our model achieves better performance on automatic image annotation and text-based image retrieval over the baseline models. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：162 / 174

页数：13

共 50 条

[1] Sparse Relational Topical Coding on multi-modal data
Song, Lingyun
Liu, Jun
Luo, Minnan
Qian, Buyue
Yang, Kuan
PATTERN RECOGNITION, 2017, 72 : 368 - 380
[2] SUPERVISED MULTI-MODAL TOPIC MODEL FOR IMAGE ANNOTATION
Tran, Thu Hoai
Choi, Seungjin
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[3] Multi-modal feature fusion for geographic image annotation
Li, Ke
Zou, Changqing
Bu, Shuhui
Liang, Yun
Zhang, Jian
Gong, Minglun
PATTERN RECOGNITION, 2018, 73 : 1 - 14
[4] Interpretable Multi-Modal Image Registration Network Based on Disentangled Convolutional Sparse Coding
Deng, Xin
Liu, Enpeng
Li, Shengxi
Duan, Yiping
Xu, Mai
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1078 - 1091
[5] A probabilistic semantic model for image annotation and multi-modal image retrieval
Zhang, RF
Zhang, ZF
Li, MJ
Ma, WY
Zhang, HJ
TENTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 846 - 851
[6] A probabilistic semantic model for image annotation and multi-modal image retrieval
Zhang, Ruofei
Zhang, Zhongfei
Li, Mingjing
Ma, Wei-Ying
Zhang, Hong-Jiang
MULTIMEDIA SYSTEMS, 2006, 12 (01) : 27 - 33
[7] A probabilistic semantic model for image annotation and multi-modal image retrieval
Ruofei Zhang
Zhongfei (Mark) Zhang
Mingjing Li
Wei-Ying Ma
Hong-Jiang Zhang
Multimedia Systems, 2006, 12 : 27 - 33
[8] Efficient multi-modal fusion on supergraph for scalable image annotation
Amiri, S. Hamid
Jarnzad, Mansour
PATTERN RECOGNITION, 2015, 48 (07) : 2241 - 2253
[9] A Multi-Modal Hashing Learning Framework for Automatic Image Annotation
Wang, Jiale
Li, Guohui
2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 14 - 21
[10] Semantic relationships in multi-modal graphs for automatic image annotation
Stathopoulos, Vassilios
Urban, Jana
Jose, Joemon
ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 490 - 497

← 1 2 3 4 5 →