Detecting irregularities in randomized controlled trials using machine learning

被引:0
|
作者
Nelson, Walter [1 ]
Petch, Jeremy [1 ,2 ,3 ]
Ranisau, Jonathan
Zhao, Robin
Balasubramanian, Kumar
Bangdiwala, Shrikant, I [2 ]
机构
[1] Hamilton Hlth Sci, Ctr Data Sci & Digital Hlth, Hamilton, ON, Canada
[2] Populat Hlth Res Inst, 20 Copeland Ave, Hamilton L8L 2X2, ON, Canada
[3] McMaster Univ, Dept Med, Hamilton, ON, Canada
关键词
Central statistical monitoring; machine learning; artificial intelligence; outlier detection; quality assurance; data quality; randomized controlled trials; DABIGATRAN;
D O I
10.1177/17407745241297947
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Background: Over the course of a clinical trial, irregularities may arise in the data. Trialists implement human-intensive, expensive central statistical monitoring procedures to identify and correct these irregularities before the results of the trial are analyzed and disseminated. Machine learning algorithms have shown promise for identifying center-level irregularities in multi-center clinical trials with minimal human intervention. We aimed to characterize the form-level data irregularities in several historical clinical trials and evaluate the ability of a machine learning-based outlier detection algorithm to identify them.Methods: Data irregularities previously identified by humans in historical clinical trials were ascertained by comparing preliminary snapshots of the trial databases to the final, locked databases. We measured the ability of a machine learning based outlier detection algorithm to identify form-level irregularities using concordance (area under the receiver operator characteristic), positive predictive value (precision), and sensitivity (recall).Results: We examined preliminary snapshots of seven historical clinical trials which randomized a total of 77,001 participants. We extracted a total of 1,267,484 completed entries from 358 case report forms containing irregularities from all snapshots across all trials, containing a total of 24,850 form-wide irregularities (median per-form form-level irregularity rate: 1.81%). Our proposed machine learning algorithm detects form-level irregularities with a median concordance of 0.74 (interquartile range = 0.57-0.89), slightly exceeding the performance of a previously proposed machine learning approach with a median area under the receiver operator characteristic of 0.73 (interquartile range = 0.54-0.88).Conclusion: Data irregularities in historical clinical trials were ascertained by comparing preliminary snapshots of the trial database to the final database. These irregularities can be categorized according to their scope. Irregularities can be successfully detected by a machine learning algorithm as early or earlier than a human can, without human intervention. Such an approach may complement existing techniques for central statistical monitoring in large multi-center randomized controlled trials and possibly improve the efficiency of costly data verification processes.
引用
收藏
页码:178 / 187
页数:10
相关论文
共 50 条
  • [1] Machine learning for detecting centre-level irregularities in randomized controlled trials: A pilot study
    Petch, Jeremy
    Nelson, Walter
    Di, Shuang
    Balasubramanian, Kumar
    Yusuf, Salim
    Devereaux, P. J.
    Borges, Flavia K.
    Bangdiwala, Shrikant I.
    CONTEMPORARY CLINICAL TRIALS, 2022, 122
  • [2] LIBOR meets machine learning: A Lasso regression approach to detecting data irregularities
    Pontines, Victor
    Rummel, Ole
    FINANCE RESEARCH LETTERS, 2023, 55
  • [3] Detecting Fake News using Machine Learning and Deep Learning Algorithms
    Abdullah-All-Tanvir
    Mahir, Ehesas Mia
    Akhter, Saima
    Huq, Mohammad Rezwanul
    2019 7TH INTERNATIONAL CONFERENCE ON SMART COMPUTING & COMMUNICATIONS (ICSCC), 2019, : 103 - 107
  • [4] Quantifying representativeness in randomized clinical trials using machine learning fairness metrics
    Qi, Miao
    Cahan, Owen
    Foreman, Morgan A.
    Gruen, Daniel M.
    Das, Amar K.
    Bennett, Kristin P.
    JAMIA OPEN, 2021, 4 (03)
  • [5] Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review
    Inoue, Kosuke
    Adomi, Motohiko
    Efthimiou, Orestis
    Komura, Toshiaki
    Omae, Kenji
    Onishi, Akira
    Tsutsumi, Yusuke
    Fujii, Tomoko
    Kondo, Naoki
    Furukawa, Toshi A.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2024, 176
  • [6] Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
    Wallace, Byron C.
    Noel-Storr, Anna
    Marshall, Iain J.
    Cohen, Aaron M.
    Smalheiser, Neil R.
    Thomas, James
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (06) : 1165 - 1168
  • [7] Detecting noncredible symptomology in ADHD evaluations using machine learning
    Finley, John-Christopher A.
    Phillips, Matthew S.
    Soble, Jason R.
    Rodriguez, Violeta J.
    JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 2024, 46 (10) : 1015 - 1025
  • [8] Detecting Phishing Domains Using Machine Learning
    Alnemari, Shouq
    Alshammari, Majid
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [9] Detecting Phishing Website Using Machine Learning
    Alkawaz, Mohammed Hazim
    Steven, Stephanie Joanne
    Hajamydeen, Asif Iqbal
    2020 16TH IEEE INTERNATIONAL COLLOQUIUM ON SIGNAL PROCESSING & ITS APPLICATIONS (CSPA 2020), 2020, : 111 - 114
  • [10] Detecting Phone Theft Using Machine Learning
    Liu, Xinyu
    Wagner, David
    Egelman, Serge
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND SYSTEM (ICISS 2018), 2018, : 30 - 36