Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models

被引:0
作者
Edoardo Saccenti
Margriet H. W. B. Hendriks
Age K. Smilde
机构
[1] Wageningen University & Research,Laboratory of Systems and Synthetic Biology
[2] DSM Biotechnology Center,Biosystems Data Analysis, Swammerdam Institute for Life Sciences
[3] University of Amsterdam,undefined
来源
Scientific Reports | / 10卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Correlation coefficients are abundantly used in the life sciences. Their use can be limited to simple exploratory analysis or to construct association networks for visualization but they are also basic ingredients for sophisticated multivariate data analysis methods. It is therefore important to have reliable estimates for correlation coefficients. In modern life sciences, comprehensive measurement techniques are used to measure metabolites, proteins, gene-expressions and other types of data. All these measurement techniques have errors. Whereas in the old days, with simple measurements, the errors were also simple, that is not the case anymore. Errors are heterogeneous, non-constant and not independent. This hampers the quality of the estimated correlation coefficients seriously. We will discuss the different types of errors as present in modern comprehensive life science data and show with theory, simulations and real-life data how these affect the correlation coefficients. We will briefly discuss ways to improve the estimation of such coefficients.
引用
收藏
相关论文
共 71 条
  • [1] Galton F(1889)Co-relations and their measurement, chiefly from anthropometric data Proceedings of the Royal Society of London 45 135-145
  • [2] Pearson K(1895)Note on regression and inheritance in the case of two parents Proceedings of the Royal Society of London 58 240-242
  • [3] Pearson K(1901)On lines and planes of closest fit to systems of points in space The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 559-572
  • [4] Hotelling H(1933)Analysis of a complex of statistical variables into principal components Journal of educational psychology 24 417-804
  • [5] Müller-Linow M(2007)Consistency analysis of metabolic correlation networks BMC Systems Biology 1 1493-255
  • [6] Weckwerth W(2008)Regulatory activity revealed by dynamic correlations in gene expression noise Nature genetics 40 796-101
  • [7] Hütt M-T(2012)Wisdom of crowds for robust gene network inference Nature Methods 9 249-2559
  • [8] Dunlop MJ(2003)A gene-coexpression network for global discovery of conserved genetic modules Science 302 72-386
  • [9] Cox RS(1904)The proof and measurement of association between two things The American journal of psychology 15 383-184
  • [10] Levine JH(1939)The effects of errors of measurement on correlation coefficients British Journal of Psychology 29 2546-63