A test for treatment effects in randomized controlled trials, harnessing the power of ultrahigh dimensional big data

被引:4
作者
Lee, Wen-Chung [1 ]
Lin, Jui-Hsiang [1 ]
机构
[1] Natl Taiwan Univ, Coll Publ Hlth, Inst Epidemiol & Prevent Med, Taipei, Taiwan
关键词
big data; biostatistics; data mining; potential-outcome model; randomized controlled trial; sample size; sharp null; GEOMETRIC REPRESENTATION; SELECTION;
D O I
10.1097/MD.0000000000017630
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: The randomized controlled trial (RCT) is the gold-standard research design in biomedicine. However, practical concerns often limit the sample size, n, the number of patients in a RCT. We aim to show that the power of a RCT can be increased by increasing p, the number of baseline covariates (sex, age, socio-demographic, genomic, and clinical profiles et al, of the patients) collected in the RCT (referred to as the 'dimension'). Methods: The conventional test for treatment effects is based on testing the 'crude null' that the outcomes of the subjects are of no difference between the two arms of a RCT. We propose a 'high-dimensional test' which is based on testing the 'sharp null' that the experimental intervention has no treatment effect whatsoever, for patients of any covariate profile. Results: Using computer simulations, we show that the high-dimensional test can become very powerful in detecting treatment effects for very large p, but not so for small or moderate p. Using a real dataset, we demonstrate that the P value of the high-dimensional test decreases as the number of baseline covariates increases, though it is still not significant. Conclusion: In this big-data era, pushing p of a RCT to the millions, billions, or even trillions may someday become feasible. And the high-dimensional test proposed in this study can become very powerful in detecting treatment effects.
引用
收藏
页数:7
相关论文
共 19 条
[1]   The high-dimension, low-sample-size geometric representation holds under mild conditions [J].
Ahn, Jeongyoun ;
Marron, J. S. ;
Muller, Keith M. ;
Chi, Yueh-Yun .
BIOMETRIKA, 2007, 94 (03) :760-766
[2]   From big data analysis to personalized medicine for all: challenges and opportunities [J].
Alyass, Akram ;
Turcotte, Michelle ;
Meyre, David .
BMC MEDICAL GENOMICS, 2015, 8
[3]  
[Anonymous], 2008, Modern epidemiology
[4]   Bounds on causal effects in randomized trials with noncompliance under monotonicity assumptions about covariates [J].
Chiba, Yasutaka .
STATISTICS IN MEDICINE, 2009, 28 (26) :3249-3259
[5]   KERNEL DIMENSION REDUCTION IN REGRESSION [J].
Fukumizu, Kenji ;
Bach, Francis R. ;
Jordan, Michael I. .
ANNALS OF STATISTICS, 2009, 37 (04) :1871-1905
[6]   Geometric representation of high dimension, low sample size data [J].
Hall, P ;
Marron, JS ;
Neeman, A .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 :427-444
[7]   Matrix variate logistic regression model with application to EEG data [J].
Hung, Hung ;
Wang, Chen-Chien .
BIOSTATISTICS, 2013, 14 (01) :189-202
[8]   Clinical research methodology I: Introduction to randomized trials [J].
Kao, Lillian S. ;
Tyson, Jon E. ;
Blakely, Martin L. ;
Lally, Kevin P. .
JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2008, 206 (02) :361-369
[9]   Big Data And New Knowledge In Medicine: The Thinking, Training, And Tools Needed For A Learning Health System [J].
Krumholz, Harlan M. .
HEALTH AFFAIRS, 2014, 33 (07) :1163-1170
[10]   INTRODUCTION TO SAMPLE-SIZE DETERMINATION AND POWER ANALYSIS FOR CLINICAL-TRIALS [J].
LACHIN, JM .
CONTROLLED CLINICAL TRIALS, 1981, 2 (02) :93-113