Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

被引:44
|
作者
Hua, Yan [1 ,2 ,3 ]
Wang, Shuhui [1 ,3 ]
Liu, Siyuan [4 ,5 ]
Cai, Anni [2 ]
Huang, Qingming [1 ,6 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intellectual Informat Proc, Beijing 100190, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
[3] Commun Univ China, Sch Informat Engn, Beijing 100024, Peoples R China
[4] Penn State Univ, Smeal Coll Business, University Pk, PA 16801 USA
[5] Shenzhen Inst Adv Technol, Inst Adv Comp & Digital Engn, Ctr Cloud Comp, Shenzhen 518055, Peoples R China
[6] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Cross-modal retrieval; localized correlation learning; semantic hierarchy; SIMILARITY;
D O I
10.1109/TMM.2016.2535864
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.
引用
收藏
页码:1201 / 1216
页数:16
相关论文
共 50 条
  • [1] TINA: Cross-modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation
    Hua, Yan
    Wang, Shuhui
    Liu, Siyuan
    Huang, Qingming
    Cai, Anni
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 190 - 199
  • [2] Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation (vol 18, pg 1201, 2016)
    Hua, Yan
    Wang, Shuhui
    Liu, Siyuan
    Cai, Anni
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (10) : 2127 - 2127
  • [3] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
    Hua, Yan
    Du, Jianhe
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
  • [4] Cross-modal semantic correlation learning by Bi-CNN network
    Wang, Chaoyi
    Li, Liang
    Yan, Chenggang
    Wang, Zhan
    Sun, Yaoqi
    Zhang, Jiyong
    IET IMAGE PROCESSING, 2021, 15 (14) : 3674 - 3684
  • [5] Analyzing semantic correlation for cross-modal retrieval
    Liang Xie
    Peng Pan
    Yansheng Lu
    Multimedia Systems, 2015, 21 : 525 - 539
  • [6] Analyzing semantic correlation for cross-modal retrieval
    Xie, Liang
    Pan, Peng
    Lu, Yansheng
    MULTIMEDIA SYSTEMS, 2015, 21 (06) : 525 - 539
  • [7] CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network
    Peng, Yuxin
    Qi, Jinwei
    Huang, Xin
    Yuan, Yuxin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 405 - 420
  • [8] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
    Zeng, Sheng
    Liu, Changhong
    Zhou, Jun
    Chen, Yong
    Jiang, Aiwen
    Li, Hanxi
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
  • [9] Learning Shared Semantic Space with Correlation Alignment for Cross-Modal Event Retrieval
    Yang, Zhenguo
    Lin, Zehang
    Kang, Peipei
    Lv, Jianming
    Li, Qing
    Liu, Wenyin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [10] Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval
    Zhu, Lei
    Song, Jiayu
    Zhu, Xiaofeng
    Zhang, Chengyuan
    Zhang, Shichao
    Yuan, Xinpan
    IEEE MULTIMEDIA, 2020, 27 (04) : 79 - 90