Common Semantic Representation Method Based on Object Attention and Adversarial Learning for Cross-Modal Data in IoV

被引:15
作者
Kou, Feifei [1 ]
Du, Junping [1 ]
Cui, Wanqiu [1 ]
Shi, Lei [1 ]
Cheng, Pengchao [1 ]
Chen, Jiannan [1 ]
Li, Jinxuan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Cross-modal data; GAN; attention model; Internet of Vehicles;
D O I
10.1109/TVT.2018.2890405
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the significant development of the Internet of Vehicles (IoV), various modal data, such as image and text, are emerging, which provide data support for good vehicle networking services. In order to make full use of the cross-modal data, we need to establish a common semantic representation to achieve effective measurement and comparison of different modal data. However, due to the heterogeneous distributions of cross-modal data, there exists a semantic gap between them. Although some deep neural network (DNN) based methods have been proposed to deal with this problem, there still exist several challenges: the qualities of the modality-specific features, the structure of the DNN, and the components of the loss function. In this paper, for representing cross-modal data in IoV, we propose a common semantic representation method based on object attention and adversarial learning (OAAL). To acquire high-quality modality-specific feature, in OAAL, we design an object attention mechanism, which links the cross-modal features effectively. To further alleviate the heterogeneous semantic gap, we construct a cross-modal generative adversarial network, which contains two parts: a generative model and a discriminative model. Besides, we also design a comprehensive loss function for the generative model to produce high-quality features. With a minimax game between the two models, we can construct a shared semantic space and generate the unified representations for cross-modal data. Finally, we apply our OAAL on retrieval task, and the results of the experiments have verified its effectiveness.
引用
收藏
页码:11588 / 11598
页数:11
相关论文
共 38 条
  • [1] [Anonymous], 2003, P ACM INT C MULT ACM
  • [2] [Anonymous], 2015, P 2015 C EMPIRICAL M
  • [3] [Anonymous], 2018, Lecture Notes on Data Engineering and Communications Technologies, DOI DOI 10.1007/978-3-319-67925-96
  • [4] [Anonymous], IEEE T MULTIMEDIA
  • [5] [Anonymous], WIRELESS INTERNET TE
  • [6] [Anonymous], 2014, ARXIV
  • [7] HICO: A Benchmark for Recognizing Human-Object Interactions in Images
    Chao, Yu-Wei
    Wang, Zhan
    He, Yugeng
    Wang, Jiaxuan
    Deng, Jia
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1017 - 1025
  • [8] Cross-modal Retrieval with Correspondence Autoencoder
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
  • [9] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [10] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672