IW-ViT: Independence-Driven Weighting Vision Transformer for out-of-distribution generalization

被引：0

作者：

Liu, Weifeng ^{[1
]}

Yu, Haoran ^{[1
]}

Wang, Yingjie ^{[2
]}

Liu, Baodi ^{[1
]}

Tao, Dapeng ^{[3
]}

Chen, Honglong ^{[1
]}

机构：

[1] China Univ Petr East China, Coll Control Sci & Engn, Qingdao 266580, Peoples R China

[2] China Univ Petr East China, Qingdao 266580, Peoples R China

[3] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650091, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 161卷

基金：

中国国家自然科学基金;

关键词：

Vision Transformer; Independence sample weighting; Feature decorrelation; Out-of-distribution generalization;

D O I：

10.1016/j.patcog.2024.111308

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision Transformer has shown excellent performance in various computer vision applications under the independently and identically distributed assumption. However, if the test distribution differs from the training distribution, the performance of the model drops significantly. To solve this problem, we propose to use independence sample weighting to improve the model's out-of-distribution generalization ability. It learns a set of sample weights to eliminate the spurious correlation between irrelevant features and labels by eliminating the dependencies between features. Previous work based on independence sample weighting only learned sample weights from the final output of the feature extractor to optimize the model. Different from these works, we consider the difference in spurious correlations between different layers in the feature extraction process. Combining the modular architecture of ViT and independence sample weighting, we propose Independence-Driven Weighting Vision Transformer (IW-ViT) for out-of-distribution generalization. The IW-ViT is constructed by a specialized encoder block, IW-Block, where each IW-Block incorporates the independence sample weighting module. Every IW-Block learns a set of sample weights and generates weighted loss function to differentially eliminate the spurious correlations in different blocks. We conduct detailed verifications on various datasets. Experimental results demonstrate that IW-ViT significantly outperforms previous work in different OOD generalization settings.

引用

页数：12

共 30 条

[1]

Arjovsky M, 2020, Arxiv, DOI [arXiv:1907.02893, 10.48550/arXiv.1907.02893]

[2] Domain Generalization by Solving Jigsaw Puzzles [J].

Carlucci, Fabio M. ;

D'Innocente, Antonio ;

Bucci, Silvia ;

Caputo, Barbara ;

Tommasi, Tatiana .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2224-2233

[3]

Dosovitskiy A, 2021, INT C LEARN REPR ICL

[4]

Gretton Arthur, 2007, Advances in Neural Information Processing Systems, V20

[5]

Gulrajani I, 2020, Arxiv, DOI [arXiv:2007.01434, DOI 10.48550/ARXIV.2007.01434]

[6] Lymphoma Ultrasound Image Segmentation with Self-Attention Mechanism and Stable Learning [J].

Han, Yingkang ;

Chen, Dehua ;

Luo, Yishu ;

Dong, Yijie .

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 :207-218

[7] Towards Non-IID image classification: A dataset and baselines [J].

He, Yue ;

Shen, Zheyan ;

Cui, Peng .

PATTERN RECOGNITION, 2021, 110

[8] Domain generalization via Inter-domain Alignment and Intra-domain Expansion [J].

Hu, Jiajun ;

Qi, Lei ;

Zhang, Jian ;

Shi, Yinghuan .

PATTERN RECOGNITION, 2024, 146

[9]

Hu S., 2019, Uncertainty in Artificial Intelligence, V35

[10] Self-challenging Improves Cross-Domain Generalization [J].

Huang, Zeyi ;

Wang, Haohan ;

Xing, Eric P. ;

Huang, Dong .

COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :124-140

← 1 2 3 →