Identifying Highly Relevant Entries in Datasets: A Relevance-Based Classification

被引:0
作者
Delbianco, Fernando [1 ]
Tohme, Fernando [1 ]
机构
[1] UNS, Dept Econ, Inst Matemat Bahia Blanca, CONICET, San Andres 800,B8002, Bahia Blanca, Argentina
关键词
Conformal prediction; Individualized inference; Synthetic data; Shapley;
D O I
10.1007/s00357-025-09513-6
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, we present a methodology to classify dataset entries in datasets, based on their relevance for answering different specific queries. It employs a repeated individualized inference approach to identify entries with significant Shapley values, contributing with accurate answers to queries about other entries in the dataset. This information is captured in three matrices: a general relevance matrix, a Shapley value matrix, and a significant Shapley value matrix. Since usually the information in datasets is non-homogeneously distributed, relevance is often concentrated in a few entries. This is in particular observed in a representative case study.
引用
收藏
页数:21
相关论文
共 38 条
[1]  
Alaa AM, 2017, ADV NEUR IN, V30
[2]   High-Dimensional Clustering via Random Projections [J].
Anderlucci, Laura ;
Fortunato, Francesca ;
Montanari, Angela .
JOURNAL OF CLASSIFICATION, 2022, 39 (01) :191-216
[3]   Explaining neural scaling laws [J].
Bahri, Yasaman ;
Dyer, Ethan ;
Kaplan, Jared ;
Lee, Jaehoon ;
Sharma, Utkarsh .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (27)
[4]   Sparse classification: a scalable discrete optimization perspective [J].
Bertsimas, Dimitris ;
Pauphilet, Jean ;
Van Parys, Bart .
MACHINE LEARNING, 2021, 110 (11-12) :3177-3209
[5]  
Bokati L., 2024, Machine Learning for Econometrics and Related Topics, P169
[6]  
Buckmann M., 2021, Data Science for Economics and Finance: Methodologies and Applications, DOI [DOI 10.1007/978-3-030-66891-43, 10.1007/978-3-030-66891-4_3]
[7]   Individualized Group Learning [J].
Cai, Chencheng ;
Chen, Rong ;
Xie, Min-ge .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) :622-638
[8]  
Delbianco F., 2021, Behaviormetrika, V48, P259, DOI [10.1007/s41237-021-00136-w, DOI 10.1007/S41237-021-00136-W]
[9]  
Delbianco F., 2023, Individualized conformal prediction: Using synthetic data as relevant controls
[10]  
Delbianco F., 2023, Asociacion Argentina de Economia Politica