Interrater Reliability of Experts in Identifying Interictal Epileptiform Discharges in Electroencephalograms

被引:81
作者
Jing, Jin [1 ,2 ]
Herlopian, Aline [1 ,3 ]
Karakis, Ioannis [4 ]
Ng, Marcus [5 ]
Halford, Jonathan J. [6 ]
Lam, Alice [1 ]
Maus, Douglas [1 ]
Chan, Fonda [1 ]
Dolatshahi, Marjan [1 ]
Muniz, Carlos F. [1 ]
Chu, Catherine [1 ]
Sacca, Valeria [7 ]
Pathmanathan, Jay [1 ,8 ]
Ge, WenDong [1 ]
Sun, Haoqi [1 ]
Dauwels, Justin [2 ]
Cole, Andrew J. [1 ]
Hoch, Daniel B. [1 ]
Cash, Sydney S. [1 ]
Westover, M. Brandon [1 ]
机构
[1] Massachusetts Gen Hosp, Dept Neurol, Div Clin Neurophysiol, 55 Fruit St, Boston, MA 02114 USA
[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
[3] Yale Sch Med, Dept Neurol, New Haven, CT USA
[4] Emory Univ, Sch Med, Dept Neurol, Atlanta, GA 30322 USA
[5] Univ Manitoba, Dept Neurol, Winnipeg, MB, Canada
[6] Med Univ South Carolina, Dept Neurol, Charleston, SC 29425 USA
[7] Magna Graecia Univ Catanzaro, Dept Med & Surg Sci, Dept Neurol, Catanzaro, Italy
[8] Hosp Univ Penn, Dept Neurol, 3400 Spruce St, Philadelphia, PA 19104 USA
关键词
STATISTICAL TURING TEST; SPIKE DETECTION; EEG; EPILEPSY; RECOGNITION; AGREEMENT; SYSTEM;
D O I
10.1001/jamaneurol.2019.3531
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Importance The validity of using electroencephalograms (EEGs) to diagnose epilepsy requires reliable detection of interictal epileptiform discharges (IEDs). Prior interrater reliability (IRR) studies are limited by small samples and selection bias. Objective To assess the reliability of experts in detecting IEDs in routine EEGs. Design, Setting, and Participants This prospective analysis conducted in 2 phases included as participants physicians with at least 1 year of subspecialty training in clinical neurophysiology. In phase 1, 9 experts independently identified candidate IEDs in 991 EEGs (1 expert per EEG) reported in the medical record to contain at least 1 IED, yielding 87636 candidate IEDs. In phase 2, the candidate IEDs were clustered into groups with distinct morphological features, yielding 12602 clusters, and a representative candidate IED was selected from each cluster. We added 660 waveforms (11 random samples each from 60 randomly selected EEGs reported as being free of IEDs) as negative controls. Eight experts independently scored all 13262 candidates as IEDs or non-IEDs. The 1051 EEGs in the study were recorded at the Massachusetts General Hospital between 2012 and 2016. Main Outcomes and Measures Primary outcome measures were percentage of agreement (PA) and beyond-chance agreement (Gwet kappa) for individual IEDs (IED-wise IRR) and for whether an EEG contained any IEDs (EEG-wise IRR). Secondary outcomes were the correlations between numbers of IEDs marked by experts across cases, calibration of expert scoring to group consensus, and receiver operating characteristic analysis of how well multivariate logistic regression models may account for differences in the IED scoring behavior between experts. Results Among the 1051 EEGs assessed in the study, 540 (51.4%) were those of females and 511 (48.6%) were those of males. In phase 1, 9 experts each marked potential IEDs in a median of 65 (interquartile range [IQR], 28-332) EEGs. The total number of IED candidates marked was 87636. Expert IRR for the 13262 individually annotated IED candidates was fair, with the mean PA being 72.4% (95% CI, 67.0%-77.8%) and mean kappa being 48.7% (95% CI, 37.3%-60.1%). The EEG-wise IRR was substantial, with the mean PA being 80.9% (95% CI, 76.2%-85.7%) and mean kappa being 69.4% (95% CI, 60.3%-78.5%). A statistical model based on waveform morphological features, when provided with individualized thresholds, explained the median binary scores of all experts with a high degree of accuracy of 80% (range, 73%-88%). Conclusions and Relevance This study's findings suggest that experts can identify whether EEGs contain IEDs with substantial reliability. Lower reliability regarding individual IEDs may be largely explained by various experts applying different thresholds to a common underlying statistical model. This study assesses the reliability of subspecialty-trained clinical neurophysiologists in detecting interictal epileptiform discharges in routine electroencephalograms (EEGs) recorded from 2012 to 2016. Question What is the reliability of subspecialty-trained clinical neurophysiologists in detecting interictal epileptiform discharges in routine electroencephalograms? Findings In this multicenter trial, 8 experts independently annotated 13262 candidate interictal epileptiform discharges. Interrater reliability for individual interictal epileptiform discharges was fair (kappa = 48.7), whereas that for whether a given electroencephalogram contained any interictal epileptiform discharges was excellent (kappa = 69.4). Meaning This study's findings suggest that experts can identify electroencephalograms containing interictal epileptiform discharges with substantial reliability and that disagreements about individual interictal epileptiform discharges can be largely explained by various experts applying different thresholds to a common underlying statistical model.
引用
收藏
页码:49 / 57
页数:9
相关论文
共 51 条
[1]  
[Anonymous], HDB INTERRATER RELIA
[2]  
[Anonymous], PRACTICAL APPROACH E
[3]  
[Anonymous], EPILEPSIES SEIZURES
[4]   Interictal epileptiform discharge characteristics underlying expert interrater agreement [J].
Bagheri, Elham ;
Dauwels, Justin ;
Dean, Brian C. ;
Waters, Chad G. ;
Westover, M. Brandon ;
Halford, Jonathan J. .
CLINICAL NEUROPHYSIOLOGY, 2017, 128 (10) :1994-2005
[5]   MEG and EEG in epilepsy [J].
Barkley, GL ;
Baumgartner, C .
JOURNAL OF CLINICAL NEUROPHYSIOLOGY, 2003, 20 (03) :163-178
[6]   "Just like EKGs!" Should EEGs undergo a confirmatory interpretation by a clinical neurophysiologist? [J].
Benbadis, Selim R. .
NEUROLOGY, 2013, 80 :S47-S51
[7]   Modern electroencephalography: its role in epilepsy management [J].
Binnie, CD ;
Stefan, H .
CLINICAL NEUROPHYSIOLOGY, 1999, 110 (10) :1671-1697
[8]   Real-time detection of epileptiform activity in the EEG: A blinded clinical trial [J].
Black, MA ;
Jones, RD ;
Carroll, GJ ;
Dingle, AA ;
Donaldson, IM ;
Parkin, PJ .
CLINICAL ELECTROENCEPHALOGRAPHY, 2000, 31 (03) :122-130
[9]   Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative [J].
Bossuyt, PM ;
Reitsma, JB ;
Bruns, DE ;
Gatsonis, CA ;
Glasziou, PP ;
Irwig, LM ;
Lijmer, JG ;
Moher, D ;
Rennie, D ;
de Vet, HCW .
ANNALS OF INTERNAL MEDICINE, 2003, 138 (01) :40-44
[10]  
Brier G. W., 1950, Monthly weather review, V78, P1, DOI [DOI 10.1175/1520-0493(1950)078LT