Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework

被引：15

作者：

Estevez, Melissa ^{[1
]}

Benedum, Corey M. ^{[1
]}

Jiang, Chengsheng ^{[1
]}

Cohen, Aaron B. ^{[1
,2
]}

Phadke, Sharang ^{[1
]}

Sarkar, Somnath ^{[1
]}

Bozkurt, Selen ^{[1
]}

机构：

[1] Flatiron Hlth Inc, 233 Spring St, New York, NY 10013 USA

[2] NYU Grossman Sch Med, Dept Med, New York, NY 10016 USA

来源：

CANCERS | 2022年 / 14卷 / 13期

关键词：

artificial intelligence; deep learning; machine learning; oncology; personalized medicine; ARTIFICIAL-INTELLIGENCE; ONCOLOGY; HEALTH;

D O I：

10.3390/cancers14133063

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

Simple Summary Many patient clinical characteristics, such as diagnosis dates, biomarker status, and therapies received, are only available as unstructured text in electronic health records. Obtaining this information for research purposes is a difficult and costly process, requiring trained clinical experts to manually review patient documents. Machine Learning techniques offer a promising solution for efficiently extracting clinically relevant information from unstructured text found in patient documents. However, the use of data produced with machine learning techniques for research purposes introduces unique challenges in assessing validity and generalizability to different cohorts of interest. To enable the effective and accurate use of such data for research purposes, we developed an evaluation framework to be utilized by model developers, data users, and other stakeholders. This framework can serve as a baseline to contextualize the quality, strengths, and limitations of using data produced with machine learning techniques for research purposes. A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.

引用

页数：12

共 38 条

[1]

Agrawal M., 2018, ARXIV

[2] A Roadmap towards Breast Cancer Therapies Supported by Explainable Artificial Intelligence [J].

Amoroso, Nicola ;

Pomarico, Domenico ;

Fanizzi, Annarita ;

Didonna, Vittorio ;

Giotta, Francesco ;

La Forgia, Daniele ;

Latorre, Agnese ;

Monaco, Alfonso ;

Pantaleo, Ester ;

Petruzzellis, Nicole ;

Tamborra, Pasquale ;

Zito, Alfredo ;

Lorusso, Vito ;

Bellotti, Roberto ;

Massafra, Raffaella .

APPLIED SCIENCES-BASEL, 2021, 11 (11)

[3] Reporting and Implementing Interventions Involving Machine Learning and Artificial Intelligence [J].

Bates, David W. ;

Auerbach, Andrew ;

Schulam, Peter ;

Wright, Adam ;

Saria, Suchi .

ANNALS OF INTERNAL MEDICINE, 2020, 172 :S137-S144

[4] Machine Learning Predicts Outcomes of Phase III Clinical Trials for Prostate Cancer [J].

Beacher, Felix D. ;

Mujica-Parodi, Lilianne R. ;

Gupta, Shreyash ;

Ancora, Leonardo A. .

ALGORITHMS, 2021, 14 (05)

[5] Opportunities and challenges in leveraging electronic health record data in oncology [J].

Berger, Marc L. ;

Curtis, Melissa D. ;

Smith, Gregory ;

Harnett, James ;

Abernethy, Amy P. .

FUTURE ONCOLOGY, 2016, 12 (10) :1261-1274

[6]

Birnbaum B, 2020, arXiv

[7] Real-world data: towards achieving the achievable in cancer care [J].

Booth, Christopher M. ;

Karim, Safiya ;

Mackillop, William J. .

NATURE REVIEWS CLINICAL ONCOLOGY, 2019, 16 (05) :312-325

[8] Bridging the divide between clinical research and clinical care in oncology: An integrated real-world evidence generation platform [J].

Bourla, Ariel B. ;

Meropol, Neal J. .

DIGITAL HEALTH, 2021, 7

[9] Reporting of artificial intelligence prediction models [J].

Collins, Gary S. ;

Moons, Karel G. M. .

LANCET, 2019, 393 (10181) :1577-1579

[10]

Desai K, 2021, VALUE HEALTH, V24, pS25

← 1 2 3 4 →