Do We Train on Test Data? Purging CIFAR of Near-Duplicates

被引：45

作者：

Barz, Bjoern ^{[1
]}

Denzler, Joachim ^{[1
]}

机构：

[1] Friedrich Schiller Univ Jena, Comp Vis Grp, Ernst Abbe Pl 2, D-07743 Jena, Germany

来源：

JOURNAL OF IMAGING | 2020年 / 6卷 / 06期

关键词：

image classification; deep learning; reproducibility; duplicates; IMAGE RETRIEVAL;

D O I：

10.3390/jimaging6060041

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the "fair CIFAR" (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. The training set remains unchanged, in order not to invalidate pre-trained models. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. We make both the ciFAIR dataset and pre-trained models publicly available and furthermore maintain a leaderboard for tracking the state of the art.

引用

页数：8

共 24 条

[1] Neural Codes for Image Retrieval [J].

Babenko, Artem ;

Slesarev, Anton ;

Chigorin, Alexandr ;

Lempitsky, Victor .

COMPUTER VISION - ECCV 2014, PT I, 2014, 8689 :584-599

[2]

Barz B, 2018, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), P683

[3]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[4]

Fellbaum C., 1998, WORDNET ELECT LEXICA, DOI DOI 10.7551/MITPRESS/7287.001.0001

[5]

He K, 2016, PROC CVPR IEEE, P770, DOI [10.1109/CVPR.2016.90, DOI 10.1109/CVPR.2016.90]

[6]

Hinton G., 2009, Handbook of Systemic Autoimmune Diseases

[7]

Huang G., 2017, P IEEE C COMP VIS PA, P4700, DOI [10.1109/CVPR.2017.243, DOI 10.1109/CVPR.2017.243]

[8]

Huiskes M.J., 2008, P MIR 08 P 2008 ACM

[9] Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors [J].

Husain, Syed Sameed ;

Bober, Miroslaw .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (09) :1783-1796

[10]

JADERBERG M, 2015, ADV NEURAL INFORM PR, P2017, DOI DOI 10.1145/2948076.2948084

← 1 2 3 →