Dimension reduction of gene expression data

被引:4
作者
Lee J. [1 ]
Ciccarello S. [2 ]
Acharjee M. [3 ]
Das K. [3 ]
机构
[1] Department of Mathematics and Statistics, James Madison University, Harrisonburg, VA
[2] Department of Mathematics and Statistics, Hollins University, Roanoke, VA
[3] Deparment of Mathematics, Lamar University, Beaumont, TX
基金
美国国家科学基金会;
关键词
DNA methylation; elastic net regression; PLS regression; Principal component analysis; supervised PCR; Y-aware PCR;
D O I
10.1080/15598608.2017.1413456
中图分类号
学科分类号
摘要
DNA methylation of specific dinucleotides has been shown to be strongly linked with tissue age. The goal of this research is to explore different analysis techniques for microarray data in order to create a more effective predictor of age from DNA methylation level. Specifically, this study compares elastic net regression models to principal component regression, supervised principal component regression, Y-aware principal component regression, and partial least squares regression models and their ability to predict tissue age based on DNA methylation levels. It has been found that the elastic net model performs better than latent variable models when considering less than ten principal components for each method, but Y-aware principal component regression predicts more accurately (with a reasonably low testing RMSE) and captures more of the desired structure when the number of principal components increases to 20. Coding limitations inhibited forming conclusive results about the performance of supervised principal component regression as the number of components increases. © 2018 Grace Scientific Publishing, LLC.
引用
收藏
页码:450 / 461
页数:11
相关论文
共 18 条
  • [1] Abdi H., Lewis-Beck M., Bryman A., Futing T., Partial least squares (PLS) regression, Encyclopedia of social sciences research methods, pp. 792-795, (2003)
  • [2] Bair E., Hastie T., Paul D., Tibshirani R., Prediction by supervised principal components, Journal of the American Statistical Association, 101, 473, pp. 119-137, (2006)
  • [3] Florath I., Butterbach K., Muller H., Bewerunge-Hudler M., Brenner H., Cross-sectional and longitudinal changes in DNA methylation with age, Human Molecular Genetics, 23, 5, pp. 1186-1201, (2014)
  • [4] Hastie T., Tibshirani R., Sherlock G., Michael E., Brown P., Botstein D., Imputing Missing Data for Gene Expression Arrays (Technical Report), (1999)
  • [5] Horvath S., Yafeng Z., Langfelder P., Kahn R.S., Boks M.P.M., Eijk K.V., Berg L.H., Ophoff R.A., Aging effects on DNA methylation modules in human brain and blood tissue, Genomic Biology, 13, 10, (2012)
  • [6] Jolliffe I.T., A note on the use of principal components in regression, Journal of the Royal Statistical Society, Series C, 31, 3, pp. 300-303, (1982)
  • [7] Kiremire A.R., The application of the Pareto principal in software engineering (Consulted), (2011)
  • [8] Kurucz M., Benczr A.A., Csalogny K., Methods for large scale SVD with missing values, Proceedings of KDD Cup and Workshop, 12, pp. 31-38, (2007)
  • [9] Li H., Bangzheng H., Lublin M., Perez Y., Distributed algorithms and optimization, (2016)
  • [10] Liu L., Hawkins D.M., Ghosh S., Young S.S., Robust singular value decomposition analysis of microarray data, Proceedings of the National Academy of Sciences of the United States of America, 100, 23, pp. 13167-13172, (2003)