Learning to Rank from Noisy Data

被引:6
|
作者
Ding, Wenkui [1 ]
Geng, Xiubo [2 ]
Zhang, Xu-Dong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Yahoo Labs Beijing, Beijing, Peoples R China
关键词
Noisy data; robust learning;
D O I
10.1145/2576230
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged. In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] How to handle noisy labels for robust learning from uncertainty
    Ji, Daehyun
    Oh, Dokwan
    Hyun, Yoonsuk
    Kwon, Oh-Min
    Park, Myeong-Jin
    NEURAL NETWORKS, 2021, 143 : 209 - 217
  • [22] DL-PDE: Deep-Learning Based Data-Driven Discovery of Partial Differential Equations from Discrete and Noisy Data
    Xu, Hao
    Chang, Haibin
    Zhang, Dongxiao
    COMMUNICATIONS IN COMPUTATIONAL PHYSICS, 2021, 29 (03) : 698 - 728
  • [23] ON FIR SYSTEM IDENTIFICATION FROM NOISY INPUT AND OUTPUT DATA
    Zheng, Wei Xing
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 112 - 115
  • [24] RobustSPAM for inference from noisy longitudinal data and preservation of privacy
    Palczewska, Anna
    Palczewski, Jan
    Aivaliotis, Georgios
    Kowalik, Lukasz
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 344 - 351
  • [25] Learning to rectify for robust learning with noisy labels
    Sun, Haoliang
    Guo, Chenhui
    Wei, Qi
    Han, Zhongyi
    Yin, Yilong
    PATTERN RECOGNITION, 2022, 124
  • [26] Data-Based Online Linear Quadratic Gaussian Control From Noisy Data
    Wang, Linqi
    Liu, Wenjie
    Li, Yifei
    Sun, Jian
    Wang, Gang
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2025,
  • [27] Progressive Ensemble Kernel-Based Broad Learning System for Noisy Data Classification
    Yu, Zhiwen
    Lan, Kankan
    Liu, Zhulin
    Han, Guoqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9656 - 9669
  • [28] Evolving General Regression Neural Networks for Learning from Noisy Datasets
    Al-Mahasneh, Ahmad Jobran
    Anavatti, Sreenatha G.
    Garratt, Matthew A.
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1473 - 1478
  • [29] Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning
    Huang, Yuxin
    Gu, Huailing
    Yu, Zhengtao
    Gao, Yumeng
    Pan, Tong
    Xu, Jialong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (01) : 121 - 134
  • [30] Noisy Data Set Identification
    Garcia, Luis Paulo F.
    de Carvalho, Andre C. P. L. F.
    Lorena, Ana C.
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2013, 8073 : 629 - 638