Kernel-Based Mixture Mapping for Image and Text Association

被引:4
作者
Du, Youtian [1 ]
Wang, Xue [1 ]
Cui, Yunbo [1 ]
Wang, Hang [1 ]
Su, Chang [2 ]
机构
[1] Xi An Jiao Tong Univ, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Shaanxi, Peoples R China
[2] Cornell Univ, Weill Cornell Med, Dept Healthcare Policy & Res, New York, NY 10065 USA
关键词
Semantics; Correlation; Probabilistic logic; Visualization; Data models; Analytical models; Optimization; Image-text association; semantic correlation; probabilistic mixture model; kernel-based mapping; MODEL; PROPAGATION; ANNOTATION; RETRIEVAL;
D O I
10.1109/TMM.2019.2930336
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modeling the relationship between multimodal media, including images, videos, and text, can reduce the gap between the modalities and promote cross-media retrieval, image annotation, etc. In this paper, we propose a new approach called kernel-based mixture mapping (KMM) to model the semantic correlations between web images and text. With this approach, we first construct latent high-dimensional feature spaces based on kernel theory to address the nonlinearity of both the data distributions in the input spaces and the cross-model correlation. Second, we present a probabilistic neighborhood model to describe the spatial locality of semantics by assuming that proximate examples in feature spaces generally have the same semantics and a conditional model to describe cross-modal conditional dependency. Finally, we build a probabilistic mixture model to jointly model the spatial locality of semantics and the conditional dependency between different modalities. By combining nonlinear transformation and probabilistic models, KMM can address the nonlinearity of cross-modal correlation, the complexity of semantic distributions at the global scale, and the continuity of semantic distributions at the local scale. We present a hybrid optimization algorithm to find the solution of KMM based on expectation-maximization and subgradient ascent; this algorithm avoids estimating the parameters of KMM in high-dimensional feature space and is proved to converge to an (local) optimal solution. We demonstrate the performance of KMM using four public datasets. The experimental results show that our approach outperforms the compared methods when modeling the relationships between images and text.
引用
收藏
页码:365 / 379
页数:15
相关论文
共 58 条
[1]  
[Anonymous], 2014, T ASSOC COMPUT LING
[2]  
[Anonymous], 2016, P INT C LEARN REPR
[3]  
[Anonymous], 2009, P ACM INT C IM VID R
[4]  
[Anonymous], 2007, P 15 INT C MULT AUGS
[5]  
[Anonymous], 1998, Technical report
[6]  
Bishop CM., 2006, Springer Google Schola, V2, P1122, DOI [10.5555/1162264, DOI 10.18637/JSS.V017.B05]
[7]   On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval [J].
Costa Pereira, Jose ;
Coviello, Emanuele ;
Doyle, Gabriel ;
Rasiwasia, Nikhil ;
Lanckriet, Gert R. G. ;
Levy, Roger ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :521-535
[8]   High-dimensional regression with gaussian mixtures and partially-latent response variables [J].
Deleforge, Antoine ;
Forbes, Florence ;
Horaud, Radu .
STATISTICS AND COMPUTING, 2015, 25 (05) :893-911
[9]   Learning Semantic Correlation of Web Images and Text with Mixture of Local Linear Mappings [J].
Du, Youtian ;
Yang, Kai .
MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, :1259-1262
[10]  
Eisenschtat A., 2017, P IEEE C COMP VIS PA, P4601