GS2P: a generative pre-trained learning to rank model with over-parameterization for web-scale search

被引:7
作者
Li, Yuchen [1 ]
Xiong, Haoyi [2 ]
Kong, Linghe [1 ]
Bian, Jiang [2 ]
Wang, Shuaiqiang [2 ]
Chen, Guihai [1 ]
Yin, Dawei [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
关键词
Learning to rank; Data reconstruction; Pre-training; Web search;
D O I
10.1007/s10994-023-06469-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from the retrieved contents based on input queries, traditional LTR models stumble over two principal stumbling blocks leading to subpar performance: 1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of various popularity, debilitating their coverage of search queries across the popularity spectrum, and 2) ill-trained models that are incapable of inducing generalized representations for LTR, culminating in overfitting. To tackle the above challenges, we proposed a Generative Semi-Supervised Pre-trained (GS(2)P) Learning to Rank model. Specifically, GS(2)P first generates pseudo-labels for the unlabeled samples using tree-based LTR models after a series of co-training procedures, then learns the representations of query-webpage pairs with self-attentive transformers via both discriminative (LTR) and generative (denoising autoencoding for reconstruction) losses. Finally, GS(2)P boosts the performance of LTR through incorporating Random Fourier Features to over-parameterize the models into "interpolating regime", so as to enjoy the further descent of generalization errors with learned representations. We conduct extensive offline experiments on a publicly available dataset and a real-world dataset collected from a large-scale search engine. The results show that GS(2)P can achieve the best performance on both datasets, compared to baselines. We also deploy GS(2)P at a large-scale web search engine with realistic traffic, where we can still observe significant improvement in real-world applications. GS(2)P performs consistently in both online and offline experiments.
引用
收藏
页码:5331 / 5349
页数:19
相关论文
共 38 条
[11]  
Joachims T., 2006, P 12 ACM SIGKDD INT, P217, DOI DOI 10.1145/1150402.1150429
[12]  
Ke GL, 2017, ADV NEUR IN, V30
[13]  
Kingma DP, 2014, arXiv preprint arXiv: 1412. 6980
[14]   Learning to Rank for Active Learning: A Listwise Approach [J].
Li, Minghan ;
Liu, Xialei ;
van de Weijer, Joost ;
Raducanu, Bogdan .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :5587-5594
[15]  
Li Y., 2022, ECML PKDD, P302
[16]  
Li Yuchen, 2023, IEEE Transactions on Knowledge and Data Engineering
[17]   Pre-trained Language Model forWeb-scale Retrieval in Baidu Search [J].
Liu, Yiding ;
Lu, Weixue ;
Cheng, Suqi ;
Shi, Daiting ;
Wang, Shuaiqiang ;
Cheng, Zhicong ;
Yin, Dawei .
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :3365-3375
[18]  
POBROTYN P., 2021, ARXIV
[19]  
Pobrotyn Przemyslaw, 2020, ARXIV
[20]  
Qin T., 2013, arXiv