Towards a better evaluation of out-of-domain generalization

被引：0

作者：

Hwang, Duhun ^{[1
]}

Kang, Suhyun ^{[2
]}

Eo, Moonjung ^{[3
]}

Kim, Jimyeong ^{[2
]}

Rhee, Wonjong ^{[2
,4
,5
]}

机构：

[1] NAVER, Shopping Fdn Models Team, Seongnam, South Korea

[2] Seoul Natl Univ, Dept Intelligence & Informat, Seoul, South Korea

[3] LG AI Res, Data Language Lab, Seoul, South Korea

[4] Seoul Natl Univ, IPAI, Seoul, South Korea

[5] Seoul Natl Univ, RICS, Seoul, South Korea

来源：

NEURAL NETWORKS | 2025年 / 188卷

基金：

新加坡国家研究基金会;

关键词：

Domain generalization; Evaluation measure; Out-of-domain generalization;

D O I：

10.1016/j.neunet.2025.107434

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The objective of Domain Generalization (DG) is to devise algorithms capable of achieving high performance on previously unseen test distributions. In the pursuit of this objective, average measure has been employed as the prevalent measure for comparing algorithms in the existing DG studies. Despite its significance, a comprehensive exploration of the average measure has been lacking and its suitability in approximating the true domain generalization performance has been questionable. In this study, we carefully investigate the limitations inherent in the average measure and propose worst+gap measure as a robust alternative. We establish theoretical grounds of the proposed measure by deriving two theorems starting from two different assumptions. Despite the independence in the two assumptions, we will show that both theorems lead to a common insight. We conduct extensive experimental investigations to compare the proposed worst+gap measure with the conventional average measure. Given the indispensable need to access the true DG performance for studying measures, we modify five existing datasets to come up with SR-CMNIST, CCats&Dogs, L-CIFAR10, PACS-corrupted, and VLCS-corrupted datasets. The experiment results unveil an inferior performance of the average measure in approximating the true DG performance and confirm the robustness of the theoretically supported worst+gap measure.

引用

页数：12

共 48 条

[1]

Ahuja K, 2021, ADV NEUR IN, V34

[2]

Arjovsky M, 2020, Arxiv, DOI arXiv:1907.02893

[3]

Aubin B., 2021, arXiv

[4]

Bienayme Irenee-Jules, 1853, Considerations a l'appui de la decouverte de Laplace sur la loi de probabilite dans la methode des moindres carres

[5]

Blanchard G., 2011, NeurIPS, V24, P2178

[6]

Blanchard G, 2021, J MACH LEARN RES, V22

[7] Domain Generalization by Solving Jigsaw Puzzles [J].

Carlucci, Fabio M. ;

D'Innocente, Antonio ;

Bucci, Silvia ;

Caputo, Barbara ;

Tommasi, Tatiana .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2224-2233

[8]

Eastwood C, 2023, Arxiv, DOI arXiv:2207.09944

[9]

Elson J, 2007, CCS'07: PROCEEDINGS OF THE 14TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, P366

[10] Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias [J].

Fang, Chen ;

Xu, Ye ;

Rockmore, Daniel N. .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :1657-1664

← 1 2 3 4 5 →