Learning to Rank from Noisy Data

被引:6
|
作者
Ding, Wenkui [1 ]
Geng, Xiubo [2 ]
Zhang, Xu-Dong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Yahoo Labs Beijing, Beijing, Peoples R China
关键词
Noisy data; robust learning;
D O I
10.1145/2576230
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged. In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Learning Programs from Noisy Data
    Raychev, Veselin
    Bielik, Pavol
    Vechev, Martin
    Krause, Andreas
    ACM SIGPLAN NOTICES, 2016, 51 (01) : 761 - 774
  • [2] Learning programs from noisy data
    Raychev V.
    Bielik P.
    Vechev M.
    Krause A.
    1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (51): : 761 - 774
  • [3] CRSSC: Salvage Reusable Samples from Noisy Data for Robust Learning
    Sun, Zeren
    Hua, Xian-Sheng
    Yao, Yazhou
    Wei, Xiu-Shen
    Hu, Guosheng
    Zhang, Jian
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 92 - 101
  • [4] Learning Robust Data-Based LQG Controllers From Noisy Data
    Liu, Wenjie
    Wang, Gang
    Sun, Jian
    Bullo, Francesco
    Chen, Jie
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (12) : 8526 - 8538
  • [5] Vacillatory and BC learning on noisy data
    Case, J
    Jain, S
    Stephan, F
    THEORETICAL COMPUTER SCIENCE, 2000, 241 (1-2) : 115 - 141
  • [6] Uncertainty-based learning of acoustic models from noisy data
    Ozerov, Alexey
    Lagrange, Mathieu
    Vincent, Emmanuel
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 874 - 894
  • [7] Comparison of Machine Learning Algorithms on Noisy Data
    Oreski, Dijana
    Visnjic, Dunja
    Kadoic, Nikola
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS, CECIIS, 2023, : 383 - 389
  • [8] APPROXIMATION FROM NOISY DATA
    Dong, Bin
    Shen, Zuowei
    Yang, Jianbin
    SIAM JOURNAL ON NUMERICAL ANALYSIS, 2021, 59 (05) : 2722 - 2745
  • [9] Bayesian deep learning with hierarchical prior: Predictions from limited and noisy data
    Luo, Xihaier
    Kareem, Ahsan
    STRUCTURAL SAFETY, 2020, 84
  • [10] Robust learning from noisy web data for fine-Grained recognition
    Cai, Zhenhuang
    Xie, Guo-Sen
    Huang, Xingguo
    Huang, Dan
    Yao, Yazhou
    Tang, Zhenmin
    PATTERN RECOGNITION, 2023, 134