ItemSage: Learning Product Embeddings for Shopping Recommendations at Pinterest

被引:16
作者
Baltescu, Paul [1 ]
Chen, Haoyu [1 ]
Pancha, Nikil [1 ]
Zhai, Andrew [1 ]
Leskovec, Jure [1 ]
Rosenberg, Charles [1 ]
机构
[1] Pinterest, San Francisco, CA 94107 USA
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
关键词
Representation Learning; Multi-Task Learning; Multi-Modal Learning; Recommender Systems;
D O I
10.1145/3534678.3539170
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learned embeddings for products are an important building block for web-scale e-commerce recommendation systems. At Pinterest, we build a single set of product embeddings called ItemSage to provide relevant recommendations in all shopping use cases including user, image and search based recommendations. This approach has led to significant improvements in engagement and conversion metrics, while reducing both infrastructure and maintenance cost. While most prior work focuses on building product embeddings from features coming from a single modality, we introduce a transformer-based architecture capable of aggregating information from both text and image modalities and show that it significantly outperforms single modality baselines. We also utilize multi-task learning to make ItemSage optimized for several engagement types, leading to a candidate generation system that is efficient for all of the engagement objectives of the end-to-end recommendation system. Extensive offline experiments are conducted to illustrate the effectiveness of our approach and results from online A/B experiments show substantial gains in key business metrics (up to +7% gross merchandise value/user and +11% click volume).
引用
收藏
页码:2703 / 2711
页数:9
相关论文
共 38 条
[1]   Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations [J].
Beal, Josh ;
Wu, Hao-Yu ;
Park, Dong Huk ;
Zhai, Andrew ;
Kislyuk, Dmitry .
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, :1431-1440
[2]   UNITER: UNiversal Image-TExt Representation Learning [J].
Chen, Yen-Chun ;
Li, Linjie ;
Yu, Licheng ;
El Kholy, Ahmed ;
Ahmed, Faisal ;
Gan, Zhe ;
Cheng, Yu ;
Liu, Jingjing .
COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :104-120
[3]   An improved data stream summary: the count-min sketch and its applications [J].
Cormode, G ;
Muthukrishnan, S .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2005, 55 (01) :58-75
[4]   Deep Neural Networks for YouTube Recommendations [J].
Covington, Paul ;
Adams, Jay ;
Sargin, Emre .
PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, :191-198
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]  
Eksombatchai C, 2018, WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), P1775
[7]   DeText: A Deep Text Ranking Framework with BERT [J].
Guo, Weiwei ;
Liu, Xiaowei ;
Wang, Sida ;
Gao, Huiji ;
Sankar, Ananth ;
Yang, Zimeng ;
Guo, Qi ;
Zhang, Liang ;
Long, Bo ;
Chen, Bee-Chung ;
Agarwal, Deepak .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :2509-2516
[8]   Deep Multimodal Representation Learning: A Survey [J].
Guo, Wenzhong ;
Wang, Jianwen ;
Wang, Shiping .
IEEE ACCESS, 2019, 7 :63373-63394
[9]  
Hamilton WL, 2017, ADV NEUR IN, V30
[10]   Heterogeneous Graph Transformer [J].
Hu, Ziniu ;
Dong, Yuxiao ;
Wang, Kuansan ;
Sun, Yizhou .
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, :2704-2710