PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods

被引：14

作者：

Romano, Joseph D. ^{[1
,2
]}

Le, Trang T. ^{[1
]}

La Cava, William ^{[1
]}

Gregg, John T. ^{[1
]}

Goldberg, Daniel J. ^{[3
]}

Chakraborty, Praneel ^{[4
,5
]}

Ray, Natasha L. ^{[6
]}

Himmelstein, Daniel ^{[7
,8
]}

Fu, Weixuan ^{[1
]}

Moore, Jason H. ^{[1
]}

机构：

[1] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA

[2] Univ Penn, Ctr Excellence Environm Toxicol, Philadelphia, PA 19104 USA

[3] Washington Univ, Dept Comp Sci & Engn, St Louis, MO 63130 USA

[4] Univ Penn, Sch Arts & Sci, Philadelphia, PA 19104 USA

[5] Univ Penn, Wharton Sch, Philadelphia, PA 19104 USA

[6] Princeton Day Sch, Princeton, NJ 08540 USA

[7] Related Sci, Denver, CO 80220 USA

[8] Univ Penn, Dept Syst Pharmacol & Translat Therapeut, Philadelphia, PA 19104 USA

来源：

BIOINFORMATICS | 2022年 / 38卷 / 03期

基金：

美国国家卫生研究院;

关键词：

D O I：

10.1093/bioinformatics/btab727

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB (Penn Machine Learning Benchmarks) provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community.

引用

页码：878 / 880

页数：3

共 12 条

[1] Caruana R., 2006, P 23 ICML PITTSB PA, P161
[2] Cortes C., 1995, KDD-95 Proceedings. First International Conference on Knowledge Discovery and Data Mining, P57
[3] Dua D, 2019, UCI MACHINE LEARNING
[4] Hastie T., 2009, Springer Series in Statistics, V2, DOI [DOI 10.1007/978-0-387-84858-7, 10.1007/978-0-387-21606-5]
[5] Systematic benchmarking of omics computational tools
Mangul, Serghei
Martin, Lana S.
Hill, Brian L.
Lam, Angela Ka-Mei
Distler, Margaret G.
Zelikovsky, Alex
Eskin, Eleazar
Flint, Jonathan
[J]. NATURE COMMUNICATIONS, 2019, 10 (1)
[6] Benchmarking of computational error-correction methods for next-generation sequencing data
Mitchell, Keith
Brito, Jaqueline J.
Mandric, Igor
Wu, Qiaozhen
Knyazev, Sergey
Chang, Sei
Martin, Lana S.
Karlsberg, Aaron
Gerasimov, Ekaterina
Littman, Russell
Hill, Brian L.
Wu, Nicholas C.
Yang, Harry Taegyun
Hsieh, Kevin
Chen, Linus
Littman, Eli
Shabani, Taylor
Enik, German
Yao, Douglas
Sun, Ren
Schroeder, Jan
Eskin, Eleazar
Zelikovsky, Alex
Skums, Pavel
Pop, Mihai
Mangul, Serghei
[J]. GENOME BIOLOGY, 2020, 21 (01)
[7] Benchmarking network for clinical and humanistic outcomes in diabetes (BENCH-D) study: protocol, tools, and population
Nicolucci, Antonio
Rossi, Maria C.
Pellegrini, Fabio
Lucisano, Giuseppe
Pintaudi, Basilio
Gentile, Sandro
Marra, Giampiero
Skovlund, Soren E.
Vespasiani, Giacomo
[J]. SPRINGERPLUS, 2014, 3 : 1 - 9
[8] PMLB: a large benchmark suite for machine learning evaluation and comparison
Olson, Randal S.
La Cava, William
Orzechowski, Patryk
Urbanowicz, Ryan J.
Moore, Jason H.
[J]. BIODATA MINING, 2017, 10
[9] Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
[10] Foundations of JSON']JSON Schema
Pezoa, Felipe
Reutter, Juan L.
Suarez, Fernando
Ugarte, Martin
Vrgoc, Domagoj
[J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 263 - 273

← 1 2 →