Common Semantic Representation Method Based on Object Attention and Adversarial Learning for Cross-Modal Data in IoV

被引：15

作者：

Kou, Feifei ^{[1
]}

Du, Junping ^{[1
]}

Cui, Wanqiu ^{[1
]}

Shi, Lei ^{[1
]}

Cheng, Pengchao ^{[1
]}

Chen, Jiannan ^{[1
]}

Li, Jinxuan ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2019年 / 68卷 / 12期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Cross-modal data; GAN; attention model; Internet of Vehicles;

D O I：

10.1109/TVT.2018.2890405

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the significant development of the Internet of Vehicles (IoV), various modal data, such as image and text, are emerging, which provide data support for good vehicle networking services. In order to make full use of the cross-modal data, we need to establish a common semantic representation to achieve effective measurement and comparison of different modal data. However, due to the heterogeneous distributions of cross-modal data, there exists a semantic gap between them. Although some deep neural network (DNN) based methods have been proposed to deal with this problem, there still exist several challenges: the qualities of the modality-specific features, the structure of the DNN, and the components of the loss function. In this paper, for representing cross-modal data in IoV, we propose a common semantic representation method based on object attention and adversarial learning (OAAL). To acquire high-quality modality-specific feature, in OAAL, we design an object attention mechanism, which links the cross-modal features effectively. To further alleviate the heterogeneous semantic gap, we construct a cross-modal generative adversarial network, which contains two parts: a generative model and a discriminative model. Besides, we also design a comprehensive loss function for the generative model to produce high-quality features. With a minimax game between the two models, we can construct a shared semantic space and generate the unified representations for cross-modal data. Finally, we apply our OAAL on retrieval task, and the results of the experiments have verified its effectiveness.

引用

页码：11588 / 11598

页数：11

共 38 条

[1] [Anonymous], 2003, P ACM INT C MULT ACM
[2] [Anonymous], 2015, P 2015 C EMPIRICAL M
[3] [Anonymous], 2018, Lecture Notes on Data Engineering and Communications Technologies, DOI DOI 10.1007/978-3-319-67925-96
[4] [Anonymous], IEEE T MULTIMEDIA
[5] [Anonymous], WIRELESS INTERNET TE
[6] [Anonymous], 2014, ARXIV
[7] HICO: A Benchmark for Recognizing Human-Object Interactions in Images
Chao, Yu-Wei
Wang, Zhan
He, Yugeng
Wang, Jiaxuan
Deng, Jia
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1017 - 1025
[8] Cross-modal Retrieval with Correspondence Autoencoder
Feng, Fangxiang
Wang, Xiaojie
Li, Ruifan
[J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
[9] Fast R-CNN
Girshick, Ross
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
[10] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

← 1 2 3 4 →