Self-Calibrated Listwise Reranking with Large Language Models

被引：0

作者：

Ren, Ruiyang ^{[1
]}

Wang, Yuhao ^{[1
]}

Zhou, Kun ^{[2
]}

Zhao, Wayne Xin ^{[1
]}

Wang, Wenjie ^{[3
]}

Liu, Jing ^{[4
]}

Wen, Ji-Rong ^{[1
]}

Chua, Tat-Seng ^{[5
]}

机构：

[1] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

[2] Renmin Univ China, Sch Informat, Beijing, Peoples R China

[3] Univ Sci & Technol China, Hefei, Peoples R China

[4] Baidu Inc, Beijing, Peoples R China

[5] Natl Univ Singapore, NExT Res Ctr, Singapore, Singapore

来源：

PROCEEDINGS OF THE ACM WEB CONFERENCE 2025, WWW 2025 | 2025年

基金：

中国国家自然科学基金;

关键词：

Text Reranking; Self-Calibration; Large Language Models;

D O I：

10.1145/3696410.3714658

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Large language models (LLMs), with advanced linguistic capabilities, have been employed in reranking tasks through a sequence-to-sequence approach. In this paradigm, multiple passages are reranked in a listwise manner and a textual reranked permutation is generated. However, due to the limited context window of LLMs, this reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. This not only increases computational costs but also restricts the LLM from fully capturing all the comparison information for all candidates. To address these challenges, we propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking. To achieve it, we first propose the relevance-aware listwise reranking framework, which incorporates explicit list-view relevance scores to improve reranking efficiency and enable global comparison across the entire candidate set. Second, to ensure the comparability of the computed scores, we propose self-calibrated training that uses point-view relevance assessments generated internally by the LLM itself to calibrate the list-view relevance assessments. Extensive experiments and comprehensive analysis on the BEIR benchmark and TREC Deep Learning Tracks demonstrate the effectiveness and efficiency of our proposed method.

引用

页码：3692 / 3701

页数：10

共 44 条

[1]

Burges Chris, 2005, P 22 INT C MACHINE L, P89

[2]

Cao Z., 2007, ACM International Conference Proceeding Series, V227, P129

[3]

Craswell Nick, 2022, TEXT RETRIEVAL C

[4] Long Document Re-ranking with Modular Re-ranker [J].

Gao, Luyu ;

Callan, Jamie .

PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, :2371-2376

[5] A Deep Look into neural ranking models for information retrieval [J].

Guo, Jiafeng ;

Fan, Yixing ;

Pang, Liang ;

Yang, Liu ;

Ai, Qingyao ;

Zamani, Hamed ;

Wu, Chen ;

Croft, W. Bruce ;

Cheng, Xueqi .

INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)

[6]

Izacard G, 2022, Arxiv, DOI arXiv:2112.09118

[7]

Karpukhin V, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P6769

[8]

Litschko Robert, 2022, P 29 INT C COMP LING, P1071

[9]

Liu Q, 2025, Arxiv, DOI arXiv:2406.14848

[10] Fine-Tuning LLaMA for Multi-Stage Text Retrieval [J].

Ma, Xueguang ;

Wang, Liang ;

Yang, Nan ;

Wei, Furu ;

Lin, Jimmy .

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, :2421-2425

← 1 2 3 4 5 →