YuYin: a multi-task learning model of multi-modal e-commerce background music recommendation

被引:1
作者
Ma, Le [1 ]
Wu, Xinda [1 ]
Tang, Ruiyuan [1 ]
Zhong, Chongjun [1 ]
Zhang, Kejun [1 ,2 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Innovat Ctr Yangtze River Delta, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Multi-modal; Music recommendation; CANONICAL CORRELATION-ANALYSIS;
D O I
10.1186/s13636-023-00306-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Appropriate background music in e-commerce advertisements can help stimulate consumption and build product image. However, many factors like emotion and product category should be taken into account, which makes manually selecting music time-consuming and require professional knowledge and it becomes crucial to automatically recommend music for video. For there is no e-commerce advertisements dataset, we first establish a large-scale e-commerce advertisements dataset Commercial-98K, which covers major e-commerce categories. Then, we proposed a video-music retrieval model YuYin to learn the correlation between video and music. We introduce a weighted fusion module (WFM) to fuse emotion features and audio features from music to get a more fine-grained music representation. Considering the similarity of music in the same product category, YuYin is trained by multi-task learning to explore the correlation between video and music by cross-matching video, music, and tag as well as a category prediction task. We conduct extensive experiments to prove YuYin achieves a remarkable improvement in video-music retrieval on Commercial-98K.
引用
收藏
页数:13
相关论文
共 60 条
[1]  
Abu-El-Haija Sami., 2016, Youtube-8m: A large-scale video classification benchmark
[2]  
Alayrac JB, 2020, ADV NEUR IN, V33
[3]  
Alpert J., 1990, PSYCHOL MARKET, V7, P109, DOI DOI 10.1002/MAR.4220070204
[4]  
Alpert J.I., 1989, ACR North American Advances
[5]  
Andrew G., 2013, ICML
[6]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[7]   MUSIC, MOOD, AND MARKETING [J].
BRUNER, GC .
JOURNAL OF MARKETING, 1990, 54 (04) :94-104
[8]   Enhancing remote sensing image retrieval using a triplet deep metric learning network [J].
Cao, Rui ;
Zhang, Qian ;
Zhu, Jiasong ;
Li, Qing ;
Li, Qingquan ;
Liu, Bozhi ;
Qiu, Guoping .
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2020, 41 (02) :740-751
[9]  
Chao J., 2011, P 10 INT SEMANTIC WE
[10]   Deep Cross-Modal Audio-Visual Generation [J].
Chen, Lele ;
Srivastava, Sudhanshu ;
Duan, Zhiyao ;
Xu, Chenliang .
PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, :349-357