Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models

被引:112
作者
Saccenti, Edoardo [1 ]
Hendriks, Margriet H. W. B. [2 ]
Smilde, Age K. [3 ]
机构
[1] Wageningen Univ & Res, Lab Syst & Synthet Biol, Wageningen, Netherlands
[2] DSM Biotechnol Ctr, Delft, Netherlands
[3] Univ Amsterdam, Swammerdam Inst Life Sci, Biosyst Data Anal, Amsterdam, Netherlands
关键词
METABOLOMICS; NETWORK; DESIGN;
D O I
10.1038/s41598-019-57247-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correlation coefficients are abundantly used in the life sciences. Their use can be limited to simple exploratory analysis or to construct association networks for visualization but they are also basic ingredients for sophisticated multivariate data analysis methods. It is therefore important to have reliable estimates for correlation coefficients. In modern life sciences, comprehensive measurement techniques are used to measure metabolites, proteins, gene-expressions and other types of data. All these measurement techniques have errors. Whereas in the old days, with simple measurements, the errors were also simple, that is not the case anymore. Errors are heterogeneous, non-constant and not independent. This hampers the quality of the estimated correlation coefficients seriously. We will discuss the different types of errors as present in modern comprehensive life science data and show with theory, simulations and real-life data how these affect the correlation coefficients. We will briefly discuss ways to improve the estimation of such coefficients.
引用
收藏
页数:19
相关论文
共 46 条
  • [1] Estimating phenotypic correlations: correcting for bias due to intraindividual variability
    Adolph, S. C.
    Hardin, J. S.
    [J]. FUNCTIONAL ECOLOGY, 2007, 21 (01) : 178 - 184
  • [2] [Anonymous], 2006, Measurement Error Models
  • [3] Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data
    Balwierz, Piotr J.
    Carninci, Piero
    Daub, Carsten O.
    Kawai, Jun
    Hayashizaki, Yoshihide
    Van Belle, Werner
    Beisel, Christian
    van Nimwegen, Erik
    [J]. GENOME BIOLOGY, 2009, 10 (07):
  • [4] SOURCES OF VARIANCE IN 24-HOUR DIETARY RECALL DATA - IMPLICATIONS FOR NUTRITION STUDY DESIGN AND INTERPRETATION
    BEATON, GH
    MILNER, J
    COREY, P
    MCGUIRE, V
    COUSINS, M
    STEWART, E
    RAMOS, MD
    HEWITT, D
    GRAMBSCH, PV
    KASSIM, N
    LITTLE, JA
    [J]. AMERICAN JOURNAL OF CLINICAL NUTRITION, 1979, 32 (12) : 2546 - 2559
  • [5] Reducing Bias and Error in the Correlation Coefficient Due to Nonnormality
    Bishara, Anthony J.
    Hittner, James B.
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2015, 75 (05) : 785 - 804
  • [6] Bravais A., 1844, Impr. Royale
  • [7] The origin of correlations in metabolomics data
    Camacho, Diogo
    de la Fuente, Alberto
    Mendes, Pedro
    [J]. METABOLOMICS, 2005, 1 (01) : 53 - 63
  • [8] Comrey AL, 2013, A first course in factor analysis, DOI 10.4324/9781315827506
  • [9] Correlated measurement error - implications for nutritional epidemiology
    Day, NE
    Wong, MY
    Bingham, S
    Khaw, KT
    Luben, R
    Michels, KB
    Welch, A
    Wareham, NJ
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2004, 33 (06) : 1373 - 1381
  • [10] Regulatory activity revealed by dynamic correlations in gene expression noise
    Dunlop, Mary J.
    Cox, Robert Sidney, III
    Levine, Joseph H.
    Murray, Richard M.
    Elowitz, Michael B.
    [J]. NATURE GENETICS, 2008, 40 (12) : 1493 - 1498