Lill-DATA - A Framework for Traceable Active Learning Projects

被引:1
作者
Stieler, Fabian [1 ,3 ]
Elia, Miriam [1 ]
Weigell, Benjamin [1 ]
Bauer, Bernhard [1 ,3 ]
Kienle, Peter [2 ]
Roth, Anton [2 ]
Muellegger, Gregor [2 ]
Nann, Marius [2 ]
Dopfer, Sarah [2 ]
机构
[1] Univ Augsburg, Inst Comp Sci, Augsburg, Germany
[2] GS Elekt Med Gerate G Stemple GmbH, Kaufering, Germany
[3] Ctr Responsible AI Technol, Munich, Germany
来源
2023 IEEE 31ST INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW | 2023年
关键词
Active Learning; Data Labeling; Traceability; Data-Centric AI; !text type='Python']Python[!/text] Framework; Open Source; MODEL;
D O I
10.1109/REW57809.2023.00088
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Active Learning has become a popular method for iteratively improving data -intensive Artificial Intelligence models. However, it often presents a significant challenge when dealing with large volumes of volatile data in projects, as with an Active Learning loop. This paper introduces LIFEDATA, a Python-based framework designed to assist developers in implementing Active Learning projects focusing on traceability. It supports seamless tracking of all artifacts, from data selection and labeling to model interpretation, thus promoting transparency throughout the entire model learning process and enhancing error debugging efficiency while ensuring experiment reproducibility. To showcase its applicability, we present two life science use cases. Moreover, the paper proposes an algorithm that combines query strategies to demonstrate LIFEDATA's ability to reduce data labeling effort.
引用
收藏
页码:465 / 474
页数:10
相关论文
共 53 条
  • [1] Cardinal, a metric-based Active learning framework
    Abraham, Alexandre
    Dreyfus-Schmidt, Leo
    [J]. SOFTWARE IMPACTS, 2022, 12
  • [2] Aggarwal U, 2020, IEEE WINT CONF APPL, P1417, DOI 10.1109/WACV45572.2020.9093475
  • [3] Angluin D., 1988, Machine Learning, V2, P319, DOI 10.1007/BF00116828
  • [4] Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
    Barredo Arrieta, Alejandro
    Diaz-Rodriguez, Natalia
    Del Ser, Javier
    Bennetot, Adrien
    Tabik, Siham
    Barbado, Alberto
    Garcia, Salvador
    Gil-Lopez, Sergio
    Molina, Daniel
    Benjamins, Richard
    Chatila, Raja
    Herrera, Francisco
    [J]. INFORMATION FUSION, 2020, 58 : 82 - 115
  • [5] Active label cleaning for improved dataset quality under resource constraints
    Bernhardt, Melanie
    Castro, Daniel C.
    Tanno, Ryutaro
    Schwaighofer, Anton
    Tezcan, Kerem C.
    Monteiro, Miguel
    Bannur, Shruthi
    Lungren, Matthew
    Nori, Aditya
    Glocker, Ben
    Alvarez-Valle, Javier
    Oktay, Ozan
    [J]. NATURE COMMUNICATIONS, 2022, 13 (01)
  • [6] A survey on active learning and human-in-the-loop deep learning for medical image analysis
    Budd, Samuel
    Robinson, Emma C.
    Kainz, Bernhard
    [J]. MEDICAL IMAGE ANALYSIS, 2021, 71
  • [7] Guest Editorial Skin Image Analysis in the Age of Deep Learning
    Celebi, M. Emre
    Barata, Catarina
    Halpern, Allan
    Tschandl, Philipp
    Combalia, Marc
    Liu, Yuan
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (01) : 143 - 144
  • [8] Chacon S., 2014, Pro git, DOI 10.1007/978-1-4842-0076-6
  • [9] Measuring Crowdsourcing Effort with Error-Time Curves
    Cheng, Justin
    Teevan, Jaime
    Bernstein, Michael S.
    [J]. CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, : 1365 - 1374
  • [10] Chew R, 2019, J MACH LEARN RES, V20, P1