GS2P: a generative pre-trained learning to rank model with over-parameterization for web-scale search

被引:7
作者
Li, Yuchen [1 ]
Xiong, Haoyi [2 ]
Kong, Linghe [1 ]
Bian, Jiang [2 ]
Wang, Shuaiqiang [2 ]
Chen, Guihai [1 ]
Yin, Dawei [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
关键词
Learning to rank; Data reconstruction; Pre-training; Web search;
D O I
10.1007/s10994-023-06469-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from the retrieved contents based on input queries, traditional LTR models stumble over two principal stumbling blocks leading to subpar performance: 1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of various popularity, debilitating their coverage of search queries across the popularity spectrum, and 2) ill-trained models that are incapable of inducing generalized representations for LTR, culminating in overfitting. To tackle the above challenges, we proposed a Generative Semi-Supervised Pre-trained (GS(2)P) Learning to Rank model. Specifically, GS(2)P first generates pseudo-labels for the unlabeled samples using tree-based LTR models after a series of co-training procedures, then learns the representations of query-webpage pairs with self-attentive transformers via both discriminative (LTR) and generative (denoising autoencoding for reconstruction) losses. Finally, GS(2)P boosts the performance of LTR through incorporating Random Fourier Features to over-parameterize the models into "interpolating regime", so as to enjoy the further descent of generalization errors with learned representations. We conduct extensive offline experiments on a publicly available dataset and a real-world dataset collected from a large-scale search engine. The results show that GS(2)P can achieve the best performance on both datasets, compared to baselines. We also deploy GS(2)P at a large-scale web search engine with realistic traffic, where we can still observe significant improvement in real-world applications. GS(2)P performs consistently in both online and offline experiments.
引用
收藏
页码:5331 / 5349
页数:19
相关论文
共 38 条
[1]   Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks [J].
Ai, Qingyao ;
Wang, Xuanhui ;
Bruch, Sebastian ;
Golbandi, Nadav ;
Bendersky, Michael ;
Najork, Marc .
PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, :84-91
[2]  
[Anonymous], 2008, ADV NEURAL INFORM PR, DOI DOI 10.5555/2981562.2981710
[3]   Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation [J].
Belkin, Mikhail .
ACTA NUMERICA, 2021, 30 :203-248
[4]   Reconciling modern machine-learning practice and the classical bias-variance trade-off [J].
Belkin, Mikhail ;
Hsu, Daniel ;
Ma, Siyuan ;
Mandal, Soumik .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) :15849-15854
[5]   Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks [J].
Bruch, Sebastian ;
Zoghi, Masrour ;
Bendersky, Michael ;
Najork, Marc .
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :1241-1244
[6]  
Burges C., 2005, P 22 INT C MACHINE L, P89, DOI DOI 10.1145/1102351.1102363
[7]  
Burges C., 2006, P ADV NEURAL INFORM, P193, DOI DOI 10.7551/MITPRESS/7503.003.0029
[8]  
Cao Z., 2007, P 24 INT C MACHINE L, V227, P129
[9]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[10]  
Jarvelin K., 2000, SIGIR Forum, V34, P41