MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

被引：422

作者：

Wei, Yinwei ^{[1
]}

Wang, Xiang ^{[2
]}

Nie, Liqiang ^{[1
]}

He, Xiangnan ^{[3
]}

Hong, Richang ^{[4
]}

Chua, Tat-Seng ^{[2
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

[3] Univ Sci & Technol China, Hefei, Peoples R China

[4] Hefei Univ Technol, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

Graph Convolution Network; Multi-modal Recommendation; Micro-video Understanding;

D O I：

10.1145/3343031.3351034

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Personalized recommendation plays a central role in many online content sharing platforms. To provide quality micro-video recommendation service, it is of crucial importance to consider the interactions between users and items (i.e., micro-videos) as well as the item contents from various modalities (e.g., visual, acoustic, and textual). Existing works on multimedia recommendation largely exploit multi-modal contents to enrich item representations, while less effort is made to leverage information interchange between users and items to enhance user representations and further capture user's fine-grained preferences on different modalities. In this paper, we propose to exploit user-item interactions to guide the representation learning in each modality, and further personalized micro-video recommendation. We design a Multimodal Graph Convolution Network (MMGCN) framework built upon the message-passing idea of graph neural networks, which can yield modal-specific representations of users and micro-videos to better capture user preferences. Specifically, we construct a user-item bipartite graph in each modality, and enrich the representation of each node with the topological structure and features of its neighbors. Through extensive experiments on three publicly available datasets, Tiktok, Kwai, and MovieLens, we demonstrate that our proposed model is able to significantly outperform state-of-the-art multi-modal recommendation methods.

引用

页码：1437 / 1445

页数：9

共 42 条

[31]

Shutova E., 2016, PROC C N AM CHAPTER, P160

[32]

Velickovic P., 2017, P 5 INT C LEARN REPR, P1

[33] Learning Deep Structure-Preserving Image-Text Embeddings [J].

Wang, Liwei ;

Li, Yin ;

Lazebnik, Svetlana .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5005-5013

[34] First-Person Daily Activity Recognition With Manipulated Object Proposals and Non-Linear Feature Fusion [J].

Wang, Meng ;

Luo, Changzhi ;

Ni, Bingbing ;

Yuan, Jun ;

Wang, Jianfeng ;

Yan, Shuicheng .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) :2946-2955

[35] Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification [J].

Wang, Meng ;

Hong, Richang ;

Li, Guangda ;

Zha, Zheng-Jun ;

Yan, Shuicheng ;

Chua, Tat-Seng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (04) :975-985

[36]

Wang S, 2018, DESTECH TRANS ENG, P139

[37] Neural Graph Collaborative Filtering [J].

Wang, Xiang ;

He, Xiangnan ;

Wang, Meng ;

Feng, Fuli ;

Chua, Tat-Seng .

PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :165-174

[38] TEM: Tree-enhanced Embedding Model for Explainable Recommendation [J].

Wang, Xiang ;

He, Xiangnan ;

Feng, Fuli ;

Nie, Liqiang ;

Chua, Tat-Seng .

WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, :1543-1552

[39] Item Silk Road: Recommending Items from Information Domains to Social Users [J].

Wang, Xiang ;

He, Xiangnan ;

Nie, Liqiang ;

Chua, Tat-Seng .

SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, :185-194

[40]

Welling M., 2016, VARIATIONAL GRAPH AU

← 1 2 3 4 5 →