Lill-DATA - A Framework for Traceable Active Learning Projects

被引:1
作者
Stieler, Fabian [1 ,3 ]
Elia, Miriam [1 ]
Weigell, Benjamin [1 ]
Bauer, Bernhard [1 ,3 ]
Kienle, Peter [2 ]
Roth, Anton [2 ]
Muellegger, Gregor [2 ]
Nann, Marius [2 ]
Dopfer, Sarah [2 ]
机构
[1] Univ Augsburg, Inst Comp Sci, Augsburg, Germany
[2] GS Elekt Med Gerate G Stemple GmbH, Kaufering, Germany
[3] Ctr Responsible AI Technol, Munich, Germany
来源
2023 IEEE 31ST INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW | 2023年
关键词
Active Learning; Data Labeling; Traceability; Data-Centric AI; !text type='Python']Python[!/text] Framework; Open Source; MODEL;
D O I
10.1109/REW57809.2023.00088
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Active Learning has become a popular method for iteratively improving data -intensive Artificial Intelligence models. However, it often presents a significant challenge when dealing with large volumes of volatile data in projects, as with an Active Learning loop. This paper introduces LIFEDATA, a Python-based framework designed to assist developers in implementing Active Learning projects focusing on traceability. It supports seamless tracking of all artifacts, from data selection and labeling to model interpretation, thus promoting transparency throughout the entire model learning process and enhancing error debugging efficiency while ensuring experiment reproducibility. To showcase its applicability, we present two life science use cases. Moreover, the paper proposes an algorithm that combines query strategies to demonstrate LIFEDATA's ability to reduce data labeling effort.
引用
收藏
页码:465 / 474
页数:10
相关论文
共 53 条
[1]   Cardinal, a metric-based Active learning framework [J].
Abraham, Alexandre ;
Dreyfus-Schmidt, Leo .
SOFTWARE IMPACTS, 2022, 12
[2]  
Aggarwal U, 2020, IEEE WINT CONF APPL, P1417, DOI 10.1109/WACV45572.2020.9093475
[3]  
Angluin D., 1988, Machine Learning, V2, P319, DOI 10.1023/A:1022821128753
[4]   Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI [J].
Barredo Arrieta, Alejandro ;
Diaz-Rodriguez, Natalia ;
Del Ser, Javier ;
Bennetot, Adrien ;
Tabik, Siham ;
Barbado, Alberto ;
Garcia, Salvador ;
Gil-Lopez, Sergio ;
Molina, Daniel ;
Benjamins, Richard ;
Chatila, Raja ;
Herrera, Francisco .
INFORMATION FUSION, 2020, 58 :82-115
[5]   Active label cleaning for improved dataset quality under resource constraints [J].
Bernhardt, Melanie ;
Castro, Daniel C. ;
Tanno, Ryutaro ;
Schwaighofer, Anton ;
Tezcan, Kerem C. ;
Monteiro, Miguel ;
Bannur, Shruthi ;
Lungren, Matthew ;
Nori, Aditya ;
Glocker, Ben ;
Alvarez-Valle, Javier ;
Oktay, Ozan .
NATURE COMMUNICATIONS, 2022, 13 (01)
[6]   A survey on active learning and human-in-the-loop deep learning for medical image analysis [J].
Budd, Samuel ;
Robinson, Emma C. ;
Kainz, Bernhard .
MEDICAL IMAGE ANALYSIS, 2021, 71
[7]   Guest Editorial Skin Image Analysis in the Age of Deep Learning [J].
Celebi, M. Emre ;
Barata, Catarina ;
Halpern, Allan ;
Tschandl, Philipp ;
Combalia, Marc ;
Liu, Yuan .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (01) :143-144
[8]  
Chacon S., 2014, Pro Git, DOI 10.1007/978-1-4842-0076-6
[9]   Measuring Crowdsourcing Effort with Error-Time Curves [J].
Cheng, Justin ;
Teevan, Jaime ;
Bernstein, Michael S. .
CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, :1365-1374
[10]  
Chew R, 2019, J MACH LEARN RES, V20, P1