Valuing Training Data via Causal Inference for In-Context Learning

被引:1
作者
Zhou, Xiaoling [1 ]
Ye, Wei [1 ]
Lee, Zhemg [2 ]
Zou, Lei [3 ]
Zhang, Shikun [1 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing 100871, Peoples R China
[2] Tianjin Univ, Tianjin 300072, Peoples R China
[3] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China
关键词
Training; Training data; Cost accounting; Robustness; Data models; Reviews; Linear regression; Semantics; Electronic mail; Computational efficiency; In-context learning; data valuation; causal inference; average marginal effect; elastic net regression; REGULARIZATION;
D O I
10.1109/TKDE.2025.3546761
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In-context learning (ICL) empowers large pre-trained language models (PLMs) to predict outcomes for unseen inputs without parameter updates. However, the efficacy of ICL heavily relies on the choice of demonstration examples. Randomly selecting from the training set frequently leads to inconsistent performance. Addressing this challenge, this study takes a novel approach by focusing on training data valuation through causal inference. Specifically, we introduce the concept of average marginal effect (AME) to quantify the contribution of individual training samples to ICL performance, encompassing both its generalization and robustness. Drawing inspiration from multiple treatment effects and randomized experiments, we initially sample diverse training subsets to construct prompts and evaluate the ICL performance based on these prompts. Subsequently, we employ Elastic Net regression to collectively estimate the AME values for all training data, considering subset compositions and inference performance. Ultimately, we prioritize samples with the highest values to prompt the inference of the test data. Across various tasks and with seven PLMs ranging in size from 0.8B to 33B, our approach consistently achieves state-of-the-art performance. Particularly, it outperforms Vanilla ICL and the best-performing baseline by an average of 14.1% and 5.2%, respectively. Moreover, prioritizing the most valuable samples for prompting leads to a significant enhancement in performance stability and robustness across various learning scenarios. Impressively, the valuable samples exhibit transferability across diverse PLMs and generalize well to out-of-distribution tasks.
引用
收藏
页码:3824 / 3840
页数:17
相关论文
共 85 条
[21]  
Jia R., 2019, PROC 22 INT C ARTIF, V89, P1167
[22]  
Jiang KF, 2023, ADV NEUR IN
[23]  
Jiang ZT, 2023, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, P2312
[24]  
Just H. A., 2023, P INT C LEARN REPR, P1
[25]  
Koh PW, 2017, PR MACH LEARN RES, V70
[26]  
Kossen J., 2024, P INT C LEARN REPR, P1
[27]  
Kwon Y., 2022, PMLR, P8780
[28]  
Kwon Yongchan, P MACHINE LEARNING R
[29]   REGULARIZATION AND THE SMALL-BALL METHOD I: SPARSE RECOVERY [J].
Lecue, Guillaume ;
Mendelson, Shahar .
ANNALS OF STATISTICS, 2018, 46 (02) :611-641
[30]  
Levy I, 2023, PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, P1401