Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

被引:42
作者
Li, Xiang [1 ,2 ]
Wang, Chao [1 ,2 ]
Tan, Jiwei [1 ,2 ]
Zeng, Xiaoyi [1 ,2 ]
Ou, Dan [1 ,2 ]
Zheng, Bo [1 ,2 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
[2] Alibaba Grp, Beijing, Peoples R China
来源
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年
关键词
multimodal learning; adversarial learning; recurrent neural network; attention; representation learning; e-commerce search;
D O I
10.1145/3366423.3380163
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.
引用
收藏
页码:827 / 836
页数:10
相关论文
共 39 条
[1]  
[Anonymous], 2013, NIPS
[2]  
[Anonymous], 2014, C EMPIRICAL METHODS
[3]  
[Anonymous], 2015, Tiny ImageNet Visual Recognition Challenge., DOI DOI 10.1109/ICCV.2015.123
[4]  
[Anonymous], 2014, P INT C LEARN REPR B
[5]  
[Anonymous], 2016, ICLR
[6]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[7]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[8]   LiJAR: A System for Job Application Redistribution towards Efficient Career Marketplace [J].
Borisyuk, Fedor ;
Zhang, Liang ;
Kenthapadi, Krishnaram .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :1397-1406
[9]   Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks [J].
Bousmalis, Konstantinos ;
Silberman, Nathan ;
Dohan, David ;
Erhan, Dumitru ;
Krishnan, Dilip .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :95-104
[10]   Deep Neural Networks for YouTube Recommendations [J].
Covington, Paul ;
Adams, Jay ;
Sargin, Emre .
PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, :191-198