Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction

被引:1
作者
Cao, Tianwei [1 ]
Xu, Qianqian [2 ]
Yang, Zhiyong [1 ]
Huang, Qingming [2 ,3 ,4 ]
机构
[1] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101408, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Key Lab Big Data Min & Knowledge Management BDKM, Beijing 101408, Peoples R China
[4] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Feature extraction; Predictive models; Wrapping; Frequency modulation; Computational modeling; Training; Recommender systems; Click-through rate prediction; recommender system; bilevel optimization; meta-learning;
D O I
10.1109/TPAMI.2021.3103741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in the recommender systems. Recently, some deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. In these work, the attention mechanism is used to select the user interested items in historical behaviors, improving the performance of the CTR predictor. Normally, these attentive modules can be jointly trained with the base predictor by using gradient descents. In this paper, we regard user interest modeling as a feature selection problem, which we call user interest selection. For such a problem, we propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper. More specifically, we use a differentiable module as our wrapping operator and then recast its learning problem as a continuous bilevel optimization. Moreover, we use a meta-learning algorithm to solve the optimization and theoretically prove its convergence. Meanwhile, we also provide theoretical analysis to show that our proposed method 1) efficiencies the wrapper-based feature selection, and 2) achieves better resistance to overfitting. Finally, extensive experiments on three public datasets manifest the superiority of our method in boosting the performance of CTR prediction.
引用
收藏
页码:8449 / 8464
页数:16
相关论文
共 70 条
[1]  
Abadi M., 2016, ARXIV160304467
[2]  
[Anonymous], 2014, Data Classif. Algorithms Appl., DOI DOI 10.1201/B17320
[3]  
[Anonymous], 2008, P 14 ACM SIGKDD INT, DOI DOI 10.1145/1401890.1401944
[4]  
Bahdanau D., 2014, 3 INT C LEARN REPR
[5]  
Bengio Y., FEATURE EXTRACTION F
[6]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[7]   Modeling Delayed Feedback in Display Advertising [J].
Chapelle, Olivier .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :1097-1105
[8]  
Cheng H. T., 2016, P 1 WORKSH DEEP LEAR, P7
[9]   AUTOMATIC HESSIANS BY REVERSE ACCUMULATION [J].
CHRISTIANSON, B .
IMA JOURNAL OF NUMERICAL ANALYSIS, 1992, 12 (02) :135-150
[10]  
Duda Richard O, 2012, PATTERN CLASSIFICATI