Kernel-Based Mixture Mapping for Image and Text Association

被引：4

作者：

Du, Youtian ^{[1
]}

Wang, Xue ^{[1
]}

Cui, Yunbo ^{[1
]}

Wang, Hang ^{[1
]}

Su, Chang ^{[2
]}

机构：

[1] Xi An Jiao Tong Univ, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Shaanxi, Peoples R China

[2] Cornell Univ, Weill Cornell Med, Dept Healthcare Policy & Res, New York, NY 10065 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 02期

关键词：

Semantics; Correlation; Probabilistic logic; Visualization; Data models; Analytical models; Optimization; Image-text association; semantic correlation; probabilistic mixture model; kernel-based mapping; MODEL; PROPAGATION; ANNOTATION; RETRIEVAL;

D O I：

10.1109/TMM.2019.2930336

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modeling the relationship between multimodal media, including images, videos, and text, can reduce the gap between the modalities and promote cross-media retrieval, image annotation, etc. In this paper, we propose a new approach called kernel-based mixture mapping (KMM) to model the semantic correlations between web images and text. With this approach, we first construct latent high-dimensional feature spaces based on kernel theory to address the nonlinearity of both the data distributions in the input spaces and the cross-model correlation. Second, we present a probabilistic neighborhood model to describe the spatial locality of semantics by assuming that proximate examples in feature spaces generally have the same semantics and a conditional model to describe cross-modal conditional dependency. Finally, we build a probabilistic mixture model to jointly model the spatial locality of semantics and the conditional dependency between different modalities. By combining nonlinear transformation and probabilistic models, KMM can address the nonlinearity of cross-modal correlation, the complexity of semantic distributions at the global scale, and the continuity of semantic distributions at the local scale. We present a hybrid optimization algorithm to find the solution of KMM based on expectation-maximization and subgradient ascent; this algorithm avoids estimating the parameters of KMM in high-dimensional feature space and is proved to converge to an (local) optimal solution. We demonstrate the performance of KMM using four public datasets. The experimental results show that our approach outperforms the compared methods when modeling the relationships between images and text.

引用

页码：365 / 379

页数：15

共 58 条

[1]

[Anonymous], 2014, T ASSOC COMPUT LING

[2]

[Anonymous], 2016, P INT C LEARN REPR

[3]

[Anonymous], 2009, P ACM INT C IM VID R

[4]

[Anonymous], 2007, P 15 INT C MULT AUGS