Polar labeling: silver standard algorithm for training disease classifiers

被引:7
作者
Wagholikar, Kavishwar B. [1 ]
Estiri, Hossein [1 ]
Murphy, Marykate [2 ]
Murphy, Shawn N. [1 ]
机构
[1] Massachusetts Gen Hosp, Comp Sci Lab, Boston, MA 02114 USA
[2] Partners Healthcare, Somerville, MA 02145 USA
关键词
GENERATION; RECORDS;
D O I
10.1093/bioinformatics/btaa088
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Expert-labeled data are essential to train phenotyping algorithms for cohort identification. However expert labeling is time and labor intensive, and the costs remain prohibitive for scaling phenotyping to wider use-cases. Results: We present an approach referred to as polar labeling (PL), to create silver standard for training machine learning (ML) for disease classification. We test the hypothesis that ML models trained on the silver standard created by applying PL on unlabeled patient records, are comparable in performance to the ML models trained on gold standard, created by clinical experts through manual review of patient records. We perform experimental validation using health records of 38 023 patients spanning six diseases. Our results demonstrate the superior performance of the proposed approach.
引用
收藏
页码:3200 / 3206
页数:7
相关论文
共 22 条
[1]   Learning statistical models of phenotypes using noisy labeled training data [J].
Agarwal, Vibhu ;
Podchiyska, Tanya ;
Banda, Juan M. ;
Goel, Veena ;
Leung, Tiffany I. ;
Minty, Evan P. ;
Sweeney, Timothy E. ;
Gyang, Elsie ;
Shah, Nigam H. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (06) :1166-1173
[2]  
Carroll Robert J, 2011, AMIA Annu Symp Proc, V2011, P189
[3]   Applying active learning to high-throughput phenotyping algorithms for electronic health records data [J].
Chen, Yukun ;
Carroll, Robert J. ;
Hinz, Eugenia R. McPeek ;
Shah, Anushi ;
Eyler, Anne E. ;
Denny, Joshua C. ;
Xu, Hua .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (E2) :E253-E259
[4]  
Geraci J, 2017, EVID-BASED MENT HEAL, V20, P83, DOI 10.1136/eb-2017-102688
[5]   Marble: High-throughput Phenotyping from Electronic Health Records via Sparse Nonnegative Tensor Factorization [J].
Ho, Joyce C. ;
Ghosh, Joydeep ;
Sun, Jimeng .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :115-124
[6]   Limestone: High-throughput candidate phenotype generation via tensor factorization [J].
Ho, Joyce C. ;
Ghosh, Joydeep ;
Steinhubl, Steve R. ;
Stewart, Walter F. ;
Denny, Joshua C. ;
Malin, Bradley A. ;
Sun, Jimeng .
JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 52 :199-211
[7]   Next-generation phenotyping of electronic health records [J].
Hripcsak, George ;
Albers, David J. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (01) :117-121
[8]  
Huang Y, 2004, LECT NOTES COMPUT SC, V3275, P153
[9]  
Jason The Mitre Corporation, 2017, ART INT HLTH HLTH CA
[10]  
Kagawa Rina, 2017, J Diabetes Sci Technol, V11, P791, DOI 10.1177/1932296816681584