Exploring scoring methods for research studies: Accuracy and variability of visual and automated sleep scoring

被引:29
作者
Berthomier, Christian [1 ]
Muto, Vincenzo [2 ,3 ,4 ]
Schmidt, Christina [2 ,4 ]
Vandewalle, Gilles [2 ]
Jaspar, Mathieu [2 ,3 ,4 ]
Devillers, Jonathan [2 ,3 ]
Gaggioni, Giulia [2 ]
Chellappa, Sarah L. [2 ]
Meyer, Christelle [2 ,3 ]
Phillips, Christophe [2 ,5 ]
Salmon, Eric [2 ]
Berthomier, Pierre [1 ]
Prado, Jacques [1 ]
Benoit, Odile [1 ]
Bouet, Romain [6 ]
Brandewinder, Marie [1 ]
Mattout, Jeremie [6 ]
Maquet, Pierre [2 ,3 ,7 ]
机构
[1] PHYSIP, 6 Rue Gobert, F-75011 Paris, France
[2] Univ Liege, GIGA Cyclotron Res Ctr Vivo Imaging, Allee 6 Aout,Batiment B30, B-4000 Liege, Belgium
[3] Walloon Excellence Life Sci & Biotechnol WELBIO, Liege, Belgium
[4] Univ Liege, Psychol & Cognit Neurosci Res Unit, Liege, Belgium
[5] Univ Liege, Dept Elect Engn & Comp Sci, Liege, Belgium
[6] Univ Lyon 1, Lyon Neurosci Res Ctr, INSERM U1028, CNRS UMR 5292, Lyon, France
[7] CHU Liege, Dept Neurol, Liege, Belgium
关键词
automatic scoring; large datasets; scoring variability; visual scoring; INTER-SCORER RELIABILITY; RESPIRATORY EVENTS; AMERICAN ACADEMY; VALIDATION; AGREEMENT; CLASSIFICATION; POLYSOMNOGRAMS; RECHTSCHAFFEN; PERFORMANCE; MEDICINE;
D O I
10.1111/jsr.12994
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Sleep studies face new challenges in terms of data, objectives and metrics. This requires reappraising the adequacy of existing analysis methods, including scoring methods. Visual and automatic sleep scoring of healthy individuals were compared in terms of reliability (i.e., accuracy and stability) to find a scoring method capable of giving access to the actual data variability without adding exogenous variability. A first dataset (DS1, four recordings) scored by six experts plus an autoscoring algorithm was used to characterize inter-scoring variability. A second dataset (DS2, 88 recordings) scored a few weeks later was used to explore intra-expert variability. Percentage agreements and Conger's kappa were derived from epoch-by-epoch comparisons on pairwise and consensus scorings. On DS1 the number of epochs of agreement decreased when the number of experts increased, ranging from 86% (pairwise) to 69% (all experts). Adding autoscoring to visual scorings changed the kappa value from 0.81 to 0.79. Agreement between expert consensus and autoscoring was 93%. On DS2 the hypothesis of intra-expert variability was supported by a systematic decrease in kappa scores between autoscoring used as reference and each single expert between datasets (.75-.70). Although visual scoring induces inter- and intra-expert variability, autoscoring methods can cope with intra-scorer variability, making them a sensible option to reduce exogenous variability and give access to the endogenous variability in the data.
引用
收藏
页数:11
相关论文
共 40 条
  • [1] An E-health solution for automatic sleep classification according to Rechtschaffen and Kales:: Validation study of the Somnolyzer 24 x 7 utilizing the Siesta database
    Anderer, P
    Gruber, G
    Parapatics, S
    Woertz, M
    Miazhynskaia, T
    Klösch, G
    Saletu, B
    Zeitlhofer, J
    Barbanoj, MJ
    Danker-Hopfe, H
    Himanen, SL
    Kemp, B
    Penzel, T
    Grözinger, M
    Kunz, D
    Rappelsberger, P
    Schlögl, A
    Dorffner, G
    [J]. NEUROPSYCHOBIOLOGY, 2005, 51 (03) : 115 - 133
  • [2] Computer-Assisted Sleep Classification according to the Standard of the American Academy of Sleep Medicine: Validation Study of the AASM Version of the Somnolyzer 24 x 7
    Anderer, Peter
    Moreau, Arnaud
    Woertz, Michael
    Ross, Marco
    Gruber, Georg
    Parapatics, Silvia
    Loretz, Erna
    Heller, Esther
    Schmidt, Andrea
    Boeck, Marion
    Moser, Doris
    Kloesch, Gerhard
    Saletu, Bernd
    Saletu-Zyhlarz, Gerda M.
    Danker-Hopfe, Heidi
    Zeitlhofer, Josef
    Dorffner, Georg
    [J]. NEUROPSYCHOBIOLOGY, 2010, 62 (04) : 250 - 264
  • [3] [Anonymous], 2007, AASM MANUAL SCORING
  • [4] [Anonymous], 2012, Handbook of inter-rater reliability
  • [5] Automatic analysis of single-channel sleep EEG:: Validation in healthy individuals
    Berthomier, Christian
    Drouot, Xavier
    Herman-Stoieca, Maria
    Berthomier, Pierre
    Prado, Jacques
    Bokar-Thire, Djibril
    Benoit, Odile
    Mattout, Jeremie
    d'Ortho, Marie-Pia
    [J]. SLEEP, 2007, 30 (11) : 1587 - 1595
  • [6] Objective Prevalence of Insomnia in the Sao Paulo, Brazil Epidemiologic Sleep Study
    Castro, Laura S.
    Poyares, Dalva
    Leger, Damien
    Bittencourt, Lia
    Tufik, Sergio
    [J]. ANNALS OF NEUROLOGY, 2013, 74 (04) : 537 - 546
  • [7] Chediak A, 2006, J CLIN SLEEP MED, V2, P427
  • [8] A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES
    COHEN, J
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) : 37 - 46
  • [9] Scoring variability between polysomnography technologists in different sleep laboratories
    Collop, Nancy A.
    [J]. SLEEP MEDICINE, 2002, 3 (01) : 43 - 47
  • [10] INTEGRATION AND GENERALIZATION OF KAPPAS FOR MULTIPLE RATERS
    CONGER, AJ
    [J]. PSYCHOLOGICAL BULLETIN, 1980, 88 (02) : 322 - 328