Dual-view multi-modal contrastive learning for graph-based recommender systems

被引:15
作者
Guo, Feipeng [1 ,2 ]
Wang, Zifan [1 ,2 ]
Wang, Xiaopeng [2 ,3 ]
Lu, Qibei [4 ]
Ji, Shaobo [5 ]
机构
[1] Zhejiang Gongshang Univ, Sch Management & E Business, Hangzhou 310018, Peoples R China
[2] Zhejiang Gongshang Univ, Modern Business Res Ctr, Hangzhou 310018, Peoples R China
[3] Zhejiang Gongshang Univ, Sch Business Adm, Hangzhou 310018, Peoples R China
[4] Zhejiang Int Studies Univ, Sch Int Business, Hangzhou 310023, Peoples R China
[5] Carleton Univ, Sprott Sch Business, Ottawa, ON K1S 5B6, Canada
关键词
Multi; -modal; Self -supervised learning; Recommender systems; Contrastive learning; Graph neural network;
D O I
10.1016/j.compeleceng.2024.109213
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized recommender systems play a crucial role in various online content-sharing platforms (e.g., TikTok). The learning of representations for multi-modal content is pivotal in current graphbased recommender systems. Existing works aim to enhance recommendation accuracy by leveraging multi-modal features (e.g., image, sound, text) as side information for items. However, this approach falls short in fully discerning users' fine-grained preferences across different modalities. To tackle this limitation, this paper introduces the Dual-view Multi-Modal contrastive learning Recommendation model (DMM-Rec). DMM-Rec employs self-supervised learning to guide the learning of user and item representations within the multi-modal context. Specifically, to capture users' preferences for different modalities, we propose specific-modal contrastive learning. Simultaneously, to capture users' cross-modal preferences, cross-modal contrastive learning is introduced to uncover interdependencies in users' preferences across modalities. The contrastive learning tasks not only adaptively explore potential relations between modalities but also address the data sparsity challenge in recommender systems. Extensive experiments conducted on three datasets and compared against ten baselines demonstrate that DMM-Rec outperforms the strongest baseline by an average of 6.81%. These results underscore the effectiveness of considering multi-modal content in improving recommender systems.
引用
收藏
页数:16
相关论文
共 47 条
[1]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[2]  
Cai X., 2023, P 11 INT C LEARN REP, DOI [10.48550/arXiv.2302.08191, DOI 10.48550/ARXIV.2302.08191]
[3]   GraphRevisedIE: Multimodal information extraction with graph-revised network [J].
Cao, Panfeng ;
Wu, Jian .
PATTERN RECOGNITION, 2023, 140
[4]   KGTN: Knowledge Graph Transformer Network for explainable multi-category item recommendation [J].
Chang, Chao ;
Zhou, Junming ;
Weng, Yu ;
Zeng, Xiangwei ;
Wu, Zhengyang ;
Wang, Chang-Dong ;
Tang, Yong .
KNOWLEDGE-BASED SYSTEMS, 2023, 278
[5]   Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention [J].
Chen, Jingyuan ;
Zhang, Hanwang ;
He, Xiangnan ;
Nie, Liqiang ;
Liu, Wei ;
Chua, Tat-Seng .
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :335-344
[6]   Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network [J].
Chen, Xu ;
Chen, Hanxiong ;
Xu, Hongteng ;
Zhang, Yongfeng ;
Cao, Yixin ;
Qin, Zheng ;
Zha, Hongyuan .
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :765-774
[7]   A recommender system fused with implicit social information through network representation learning [J].
Chen, Yida ;
Qiu, Xiaoyu ;
Ma, Chuanjiang ;
Xu, Yunfeng ;
Sun, Yang .
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]   Multi-feature fused collaborative attention network for sequential recommendation with semantic-enriched contrastive learning [J].
Duan, Huajuan ;
Zhu, Yingzheng ;
Liang, Xiufang ;
Zhu, Zhenfang ;
Liu, Peiyu .
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
[10]  
Feng Xia, 2021, IEEE Transactions on Artificial Intelligence, V2, P109, DOI [10.1109/tai.2021.3076021, 10.1109/TAI.2021.3076021]