Towards a framework for interoperability and reproducibility of predictive models

被引：2

作者：

Al Rahrooh ^{[1
]}

Garlid, Anders O. ^{[1
]}

Bartlett, Kelly ^{[1
]}

Coons, Warren ^{[1
]}

Petousis, Panayiotis ^{[2
]}

Hsu, William ^{[1
]}

Bui, Alex A. T. ^{[1
,2
]}

机构：

[1] Univ Calif Los Angeles UCLA, Med & Imaging Informat MII Grp, Los Angeles, CA 90024 USA

[2] Univ Calif Los Angeles UCLA, Clin & Translat Sci Inst CTSI, Los Angeles, CA USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2024年 / 149卷

基金：

美国国家卫生研究院;

关键词：

Blueprints - Markup languages - Pipelines;

D O I：

10.1016/j.jbi.2023.104551

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The development and deployment of machine learning (ML) models for biomedical research and healthcare currently lacks standard methodologies. Although tools for model replication are numerous, without a unifying blueprint it remains difficult to scientifically reproduce predictive ML models for any number of reasons (e.g., assumptions regarding data distributions and preprocessing, unclear test metrics, etc.) and ultimately, questions around generalizability and transportability are not readily answered. To facilitate scientific reproducibility, we built upon the Predictive Model Markup Language (PMML) to capture essential information. As a key component of the PREdictive Model Index and Exchange REpository (PREMIERE) platform, we present the Automated Metadata Pipeline (AMP) for conversion of a given predictive ML model into an extended PMML file that autocompletes an ML-based checklist, assessing model elements for interoperability and reproducibility. We demonstrate this pipeline on multiple test cases with three different ML algorithms and health-related datasets, providing a foundation for future predictive model reproducibility, sharing, and comparison.

引用

页数：9