Scalable multi-label canonical correlation analysis for cross-modal retrieval

被引：17

作者：

Shu, Xin ^{[1
,2
]}

Zhao, Guoying ^{[2
]}

机构：

[1] Nanjing Agr Univ, Coll Artificial Intelligence, 1 Wei Gang, Nanjing, Peoples R China

[2] Univ Oulu, Ctr Machine Vis & Signal Anal, Oulu, Finland

来源：

PATTERN RECOGNITION | 2021年 / 115卷

基金：

芬兰科学院; 中国国家自然科学基金;

关键词：

Canonical correlation analysis; Semantic transformation; Cross-modal retrieval; Singular value decomposition;

D O I：

10.1016/j.patcog.2021.107905

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-label canonical correlation analysis (ml-CCA) has been developed for cross-modal retrieval. However, the computation of ml-CCA involves dense matrices eigendecomposition, which can be computationally expensive. In addition, ml-CCA only takes semantic correlation into account which ignores the cross-modal feature correlation. In this paper, we propose a novel framework to simultaneously integrate the semantic correlation and feature correlation for cross-modal retrieval. By using the semantic transformation, we show that our model can avoid computing the covariance matrix explicitly which is a huge save of computational cost. Further analysis shows that our proposed method can be solved via singular value decomposition which has linear time complexity. Experimental results on three multi-label datasets have demonstrated the accuracy and efficiency of our proposed method. ? 2021 Elsevier Ltd. All rights reserved.

引用

页数：10

共 32 条

[1]

Andrienko G., 2013, Introduction, P1

[2]

[Anonymous], 2009, P ACM INT C IM VID R

[3]

[Anonymous], 2013, P 27 AAAI C ART INT

[4] Representation learning using step-based deep multi-modal autoencoders [J].

Bhatt, Gaurav ;

Jha, Piyush ;

Raman, Balasubramanian .

PATTERN RECOGNITION, 2019, 95 :12-23

[5] Generalized Multi-View Embedding for Visual Recognition and Cross-Modal Retrieval [J].

Cao, Guanqun ;

Iosifidis, Alexandros ;

Chen, Ke ;

Gabbouj, Moncef .

IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (09) :2542-2555

[6] On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval [J].

Costa Pereira, Jose ;

Coviello, Emanuele ;

Doyle, Gabriel ;

Rasiwasia, Nikhil ;

Lanckriet, Gert R. G. ;

Levy, Roger ;

Vasconcelos, Nuno .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :521-535

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8]

Golub G. H., 2012, Matrix computations, V3

[9] A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics [J].

Gong, Yunchao ;

Ke, Qifa ;

Isard, Michael ;

Lazebnik, Svetlana .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 106 (02) :210-233

[10] Relations between two sets of variates [J].

Hotelling, H .

BIOMETRIKA, 1936, 28 :321-377

← 1 2 3 4 →