Where are the large and difficult datasets?

被引:0
作者
Adrien Jamain
David J. Hand
机构
[1] BNP-Paribas,Department of Mathematics
[2] Institute for Mathematical Sciences,undefined
来源
Advances in Data Analysis and Classification | 2009年 / 3卷
关键词
Error rate; Meta-analysis; Comparative studies; Repositories; 6207; 68T10;
D O I
暂无
中图分类号
学科分类号
摘要
A great many comparative performance assessments of classification rules have been undertaken, ranging from small ones involving just one or two methods, to large ones involving many tens of methods. We are undertaking a meta-analytic study of these studies, attempting to distil some overall conclusions. This paper describes just one of our observations. The dataset analysed in this paper contains 5,203 error rates taken from 45 articles and describing 146 datasets. One curious general relationship which was persistent in our data, despite the fact that we were looking at results mixed between distributions rather than conditional on distributions, was that error rate decreased with increasing dataset size. We believe this to be an artefact of the way datasets are collected by the research community.
引用
收藏
页码:25 / 38
页数:13
相关论文
共 50 条
  • [1] Where are the large and difficult datasets?
    Jamain, Adrien
    Hand, David J.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2009, 3 (01) : 25 - 38
  • [2] Constructing Explicit Prejudice: Evidence From Large Sample Datasets
    Lee, Kent M.
    Lindquist, Kristen A.
    Payne, B. Keith
    PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN, 2023, 49 (04) : 541 - 553
  • [3] Analysis of large datasets for identifying molecular targets in intestinal polyps and metabolic disorders
    Ou, Shan
    Xu, Yun
    Liu, Qinglan
    Yang, Tianwen
    Xiu, Wei Chen
    Yuan, Xiu
    Zuo, Xin
    Shi, Peng
    Yao, Jie
    BIOCELL, 2024, 48 (03) : 415 - 429
  • [4] Transpapillary Biliary Cannulation is Difficult in Cases with Large Oral Protrusion of the Duodenal Papilla
    Watanabe, Masafumi
    Okuwaki, Kosuke
    Kida, Mitsuhiro
    Imaizumi, Hiroshi
    Yamauchi, Hiroshi
    Kaneko, Toru
    Iwai, Tomohisa
    Hasegawa, Rikiya
    Miyata, Eiji
    Masutani, Hironori
    Tadehara, Masayoshi
    Adachi, Kai
    Koizumi, Wasaburo
    DIGESTIVE DISEASES AND SCIENCES, 2019, 64 (08) : 2291 - 2299
  • [5] Endoscopic large balloon sphincteroplasty is a useful, safe adjunct for difficult to treat choledocholithiasis
    Turner, Greg A.
    Ing, Andrew J.
    Connor, Saxon J.
    ANZ JOURNAL OF SURGERY, 2016, 86 (05) : 395 - 398
  • [6] Large, open datasets for human connectomics research: Considerations for reproducible and responsible data use
    Laird, Angela R.
    NEUROIMAGE, 2021, 244
  • [7] Causes and Countermeasures of Difficult Selective Biliary Cannulation: A Large Sample Size Retrospective Study
    Liu, Yang
    Liu, Wei
    Hong, Junbo
    Li, Guohua
    Chen, Youxiang
    Xie, Yong
    Zhou, Xiaojiang
    SURGICAL LAPAROSCOPY ENDOSCOPY & PERCUTANEOUS TECHNIQUES, 2021, 31 (05) : 533 - 538
  • [8] Drug comparisons: why are they so difficult?
    Salonen, R
    CEPHALALGIA, 2000, 20 : 25 - 32
  • [9] Risk Prediction Model for Late Life Depression: Development and Validation on Three Large European Datasets
    Cattelani, Luca
    Murri, Martino Belvederi
    Chesani, Federico
    Chiari, Lorenzo
    Bandinelli, Stefania
    Palumbo, Pierpaolo
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (05) : 2196 - 2204
  • [10] Exploring changes in the invasion pattern of alien flora in Catalonia (NE of Spain) from large datasets
    Girado-Beltran, Paola
    Andreu, Jara
    Pino, Joan
    BIOLOGICAL INVASIONS, 2015, 17 (10) : 3015 - 3028