Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure

被引:116
|
作者
Hutchinson, Ben
Smart, Andrew
Hanna, Alex
Denton, Emily
Greer, Christina
Kjartansson, Oddur
Barnes, Parker
Mitchell, Margaret
机构
来源
PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021 | 2021年
关键词
datasets; requirements engineering; machine learning; PERFORMANCE; CHALLENGES; DILEMMAS; SCIENCE;
D O I
10.1145/3442188.3445918
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Datasets that power machine learning are often used, shared, and reused with little visibility into the processes of deliberation that led to their creation. As artificial intelligence systems are increasingly used in high-stakes tasks, system development and deployment practices must be adapted to address the very real consequences of how model development data is constructed and used in practice. This includes greater transparency about data, and accountability for decisions made when developing it. In this paper, we introduce a rigorous framework for dataset development transparency that supports decision-making and accountability. The framework uses the cyclical, infrastructural and engineering nature of dataset development to draw on best practices from the software development lifecycle. Each stage of the data development lifecycle yields documents that facilitate improved communication and decision-making, as well as drawing attention to the value and necessity of careful data work. The proposed framework makes visible the often overlooked work and decisions that go into dataset creation, a critical step in closing the accountability gap in artificial intelligence and a critical/necessary resource aligned with recent work on auditing processes.
引用
收藏
页码:560 / 575
页数:16
相关论文
共 50 条
  • [1] Making the Most of Small Software Engineering Datasets With Modern Machine Learning
    Prenner, Julian Aron
    Robbes, Romain
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (12) : 5050 - 5067
  • [2] Software engineering practices for machine learning - Adoption, effects, and team assessment
    Serban, Alex
    van der Blom, Koen
    Hoos, Holger
    Visser, Joost
    JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 209
  • [3] Machine Learning for Software Engineering
    Meinke, Karl
    Bennaceur, Amel
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 548 - 549
  • [4] Machine learning and software engineering
    Zhang, D
    Tsai, JJP
    14TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, : 22 - 29
  • [5] Machine learning and software engineering
    Zhang, D
    Tsai, JJP
    SOFTWARE QUALITY JOURNAL, 2003, 11 (02) : 87 - 119
  • [6] Machine Learning and Software Engineering
    Du Zhang
    Jeffrey J.P. Tsai
    Software Quality Journal, 2003, 11 : 87 - 119
  • [7] Towards Authentic Undergraduate Research Experiences in Software Engineering and Machine Learning
    Chakraborty, Suranjan
    Deng, Lin
    Dehlinger, Josh
    EASEAI '21: PROCEEDINGS OF THE 3RD INTERNATIONAL WORKSHOP ON EDUCATION THROUGH ADVANCED SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, 2021, : 54 - 57
  • [8] Quantum machine learning: from physics to software engineering
    Melnikov, Alexey
    Kordzanganeh, Mohammad
    Alodjants, Alexander
    Lee, Ray-Kuang
    ADVANCES IN PHYSICS-X, 2023, 8 (01):
  • [9] Machine learning for architectural design: Practices and infrastructure
    Tamke, Martin
    Nicholas, Paul
    Zwierzycki, Mateusz
    INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2018, 16 (02) : 123 - 143
  • [10] Software Engineering of Machine Learning Systems
    Isbell, Charles
    Littman, Michael L.
    Norvig, Peter
    COMMUNICATIONS OF THE ACM, 2023, 66 (02) : 35 - 37