DescribeML: A dataset description tool for machine learning

被引:3
作者
Giner-Miguelez, Joan [1 ]
Gomez, Abel [1 ]
Cabot, Jordi [2 ]
机构
[1] Univ Oberta Catalunya UOC, Internet Interdisciplinary Inst, Barcelona, Spain
[2] Luxembourg Inst Sci & Technol LIST, Esch Sur Alzette, Luxembourg
关键词
Datasets; Machine learning; Model-driven engineering; Fairness; Domain-specific languages;
D O I
10.1016/j.scico.2023.103030
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Datasets are essential for training and evaluating machine learning models. However, they are also the root cause of many undesirable model behaviors, such as biased predictions. To address this issue, the machine learning community is proposing as a best practice the adoption of common guidelines for describing datasets. However, these guidelines are based on natural language descriptions of the dataset, hampering the automatic computation and analysis of such descriptions. To overcome this situation, we present DescribeML, a language engineering tool to precisely describe machine learning datasets in terms of their composition, provenance, and social concerns in a structured format. The tool is implemented as a Visual Studio Code extension.(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons .org /licenses /by-nc -nd /4 .0/).
引用
收藏
页数:5
相关论文
共 6 条
[1]  
Bender Emily M, 2018, Transactions of the Association for Computational Linguistics, V6, P587
[2]  
Gebru T, 2021, COMMUN ACM, V64, P86, DOI 10.1145/3458723
[3]   A domain-specific language for describing machine learning datasets [J].
Giner-Miguelez, Joan ;
Gomez, Abel ;
Cabot, Jordi .
JOURNAL OF COMPUTER LANGUAGES, 2023, 76
[4]  
Heger Amy K., 2022, Proceedings of the ACM on Human-Computer Interaction, DOI 10.1145/3555760
[5]  
Holland Sarah, 2020, Data Protection and Privacy, V12, P1, DOI DOI 10.5040/9781509932771.CH-001
[6]   Data and its (dis)contents: A survey of dataset development and use in machine learning research [J].
Paullada, Amandalynne ;
Raji, Inioluwa Deborah ;
Bender, Emily M. ;
Denton, Emily ;
Hanna, Alex .
PATTERNS, 2021, 2 (11)