Where are the large and difficult datasets?

被引：0

作者：

Adrien Jamain

David J. Hand

机构：

[1] BNP-Paribas,Department of Mathematics

[2] Institute for Mathematical Sciences,undefined

来源：

Advances in Data Analysis and Classification | 2009年 / 3卷

关键词：

Error rate; Meta-analysis; Comparative studies; Repositories; 6207; 68T10;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A great many comparative performance assessments of classification rules have been undertaken, ranging from small ones involving just one or two methods, to large ones involving many tens of methods. We are undertaking a meta-analytic study of these studies, attempting to distil some overall conclusions. This paper describes just one of our observations. The dataset analysed in this paper contains 5,203 error rates taken from 45 articles and describing 146 datasets. One curious general relationship which was persistent in our data, despite the fact that we were looking at results mixed between distributions rather than conditional on distributions, was that error rate decreased with increasing dataset size. We believe this to be an artefact of the way datasets are collected by the research community.

引用

页码：25 / 38

页数：13

共 50 条

[1] Where are the large and difficult datasets?
Jamain, Adrien
Hand, David J.
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2009, 3 (01) : 25 - 38
[2] Constructing Explicit Prejudice: Evidence From Large Sample Datasets
Lee, Kent M.
Lindquist, Kristen A.
Payne, B. Keith
PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN, 2023, 49 (04) : 541 - 553
[3] Analysis of large datasets for identifying molecular targets in intestinal polyps and metabolic disorders
Ou, Shan
Xu, Yun
Liu, Qinglan
Yang, Tianwen
Xiu, Wei Chen
Yuan, Xiu
Zuo, Xin
Shi, Peng
Yao, Jie
BIOCELL, 2024, 48 (03) : 415 - 429
[4] Transpapillary Biliary Cannulation is Difficult in Cases with Large Oral Protrusion of the Duodenal Papilla
Watanabe, Masafumi
Okuwaki, Kosuke
Kida, Mitsuhiro
Imaizumi, Hiroshi
Yamauchi, Hiroshi
Kaneko, Toru
Iwai, Tomohisa
Hasegawa, Rikiya
Miyata, Eiji
Masutani, Hironori
Tadehara, Masayoshi
Adachi, Kai
Koizumi, Wasaburo
DIGESTIVE DISEASES AND SCIENCES, 2019, 64 (08) : 2291 - 2299
[5] Endoscopic large balloon sphincteroplasty is a useful, safe adjunct for difficult to treat choledocholithiasis
Turner, Greg A.
Ing, Andrew J.
Connor, Saxon J.
ANZ JOURNAL OF SURGERY, 2016, 86 (05) : 395 - 398
[6] Large, open datasets for human connectomics research: Considerations for reproducible and responsible data use
Laird, Angela R.
NEUROIMAGE, 2021, 244
[7] Causes and Countermeasures of Difficult Selective Biliary Cannulation: A Large Sample Size Retrospective Study
Liu, Yang
Liu, Wei
Hong, Junbo
Li, Guohua
Chen, Youxiang
Xie, Yong
Zhou, Xiaojiang
SURGICAL LAPAROSCOPY ENDOSCOPY & PERCUTANEOUS TECHNIQUES, 2021, 31 (05) : 533 - 538
[8] Drug comparisons: why are they so difficult?
Salonen, R
CEPHALALGIA, 2000, 20 : 25 - 32
[9] Risk Prediction Model for Late Life Depression: Development and Validation on Three Large European Datasets
Cattelani, Luca
Murri, Martino Belvederi
Chesani, Federico
Chiari, Lorenzo
Bandinelli, Stefania
Palumbo, Pierpaolo
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (05) : 2196 - 2204
[10] Exploring changes in the invasion pattern of alien flora in Catalonia (NE of Spain) from large datasets
Girado-Beltran, Paola
Andreu, Jara
Pino, Joan
BIOLOGICAL INVASIONS, 2015, 17 (10) : 3015 - 3028

← 1 2 3 4 5 →