Sample-Size Planning for Multivariate Data: A Raman-Spectroscopy-Based Example

被引:43
作者
Ali, Nairveen [1 ,2 ,3 ]
Girnus, Sophie [1 ,2 ]
Roesch, Petra [1 ,2 ]
Popp, Juergen [1 ,2 ,3 ,4 ,5 ]
Bocklitz, Thomas [1 ,2 ,3 ]
机构
[1] Friedrich Schiller Univ, Inst Phys Chem, Helmholtzweg 4, D-07743 Jena, Germany
[2] Friedrich Schiller Univ, Abbe Ctr Photon IPC, Helmholtzweg 4, D-07743 Jena, Germany
[3] Leibniz Inst Photon Technol IPHT, Albert Einstein Str 9, D-07745 Jena, Germany
[4] Jena Univ Hosp, CSCC, Erlanger Allee 101, D-07747 Jena, Germany
[5] Forschungscampus Jena, InfectoGnost, Philosophenweg 7, D-07743 Jena, Germany
关键词
CLASSIFICATION; CALIBRATION; DISCRIMINATION; IDENTIFICATION; ACCURACY;
D O I
10.1021/acs.analchem.8b02167
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The goal of sample-size planning (SSP) is to determine the number of measurements needed for statistical analysis. This SSP is necessary to achieve robust and significant results with a minimal number of measurements that need to be collected. SSP is a common procedure for univariate measurements, whereas for multivariate measurements, like spectra or time traces, no general sample-size-planning method exists. Sample-size planning becomes more important for biospectroscopic data because the data generation is time-consuming and costly. Additionally, ethical reasons do not allow the use of unnecessary samples and the measurement of unnecessary spectra. In this paper, a general sample-size-planning algorithm is presented that is based on learning curves. The learning curve quantifies the improvement of a classifier for an increasing training-set size. These curves are fitted by the inverse-power law, and the parameters of this fit can be utilized to predict the necessary training-set size. Sample-size planning is demonstrated for a biospectroscopic task of differentiating six different bacterial species, including Escherichia coli, Klebsiella terrigena, Pseudomonas stutzeri, Listeria innocua, Staphylococcus warneri, and Staphylococcus cohnii, on the basis of their Raman spectra. Thereby, we estimate the required number of Raman spectra and biological replicates to train a classification model, which consists of principal-component analysis (PCA) combined with linear-discriminant analysis (LDA). The presented algorithm revealed that 142 Raman spectra per species and seven biological replicates are needed for the above-mentioned biospectroscopic-classification task. Even though it was not demonstrated, the learning-curve-based sample-size-planning algorithm can be applied to any multivariate data and in particular to biospectroscopic-classification tasks.
引用
收藏
页码:12485 / 12492
页数:8
相关论文
共 41 条
[1]  
[Anonymous], 2015, J Hum Reprod Sci, V8, P186, DOI 10.4103/0974-1208.165154
[2]   Sample size planning for classification models [J].
Beleites, Claudia ;
Neugebauer, Ute ;
Bocklitz, Thomas ;
Krafft, Christoph ;
Popp, Juergen .
ANALYTICA CHIMICA ACTA, 2013, 760 :25-33
[3]   Spectrometer calibration protocol for Raman spectra recorded with different excitation wavelengths [J].
Bocklitz, T. W. ;
Doerfer, T. ;
Heinke, R. ;
Schmitt, M. ;
Popp, J. .
SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2015, 149 :544-549
[4]   Raman Based Molecular Imaging and Analytics: A Magic Bullet for Biomedical Applications!? [J].
Bocklitz, Thomas W. ;
Guo, Shuxia ;
Ryabchykov, Oleg ;
Vogler, Nadine ;
Popp, Juergen .
ANALYTICAL CHEMISTRY, 2016, 88 (01) :133-151
[5]   Miniaturized nuclear magnetic resonance platform for detection and profiling of circulating tumor cells [J].
Castro, Cesar M. ;
Ghazani, Arezou A. ;
Chung, Jaehoon ;
Shao, Huilin ;
Issadore, David ;
Yoon, Tae-Jong ;
Weissleder, Ralph ;
Lee, Hakho .
LAB ON A CHIP, 2014, 14 (01) :14-23
[6]   The Use of Wavelength Modulated Raman Spectroscopy in Label-Free Identification of T Lymphocyte Subsets, Natural Killer Cells and Dendritic Cells [J].
Chen, Mingzhou ;
McReynolds, Naomi ;
Campbell, Elaine C. ;
Mazilu, Michael ;
Barbosa, Joao ;
Dholakia, Kishan ;
Powis, Simon J. .
PLOS ONE, 2015, 10 (05)
[7]  
Cochran WG, 1977, Sampling Techniques, V3rd
[8]   A new method using Raman spectroscopy for in vivo targeted brain cancer tissue biopsy [J].
Desroches, Joannie ;
Jermyn, Michael ;
Pinto, Michael ;
Picot, Fabien ;
Tremblay, Marie-Andree ;
Obaid, Sami ;
Marple, Eric ;
Urmey, Kirk ;
Trudel, Dominique ;
Soulez, Gilles ;
Guiot, Marie-Christine ;
Wilson, Brian C. ;
Petrecca, Kevin ;
Leblond, Frederic .
SCIENTIFIC REPORTS, 2018, 8
[9]   Predicting sample size required for classification performance [J].
Figueroa, Rosa L. ;
Zeng-Treitler, Qing ;
Kandula, Sasikiran ;
Ngo, Long H. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2012, 12
[10]   Common mistakes in cross-validating classification models [J].
Guo, Shuxia ;
Bocklitz, Thomas ;
Neugebauer, Ute ;
Popp, Juergen .
ANALYTICAL METHODS, 2017, 9 (30) :4410-4417