Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data

被引:7
作者
Stefelova, Nikola [1 ]
Palarea-Albaladejo, Javier [2 ]
Hron, Karel [1 ]
机构
[1] Palacky Univ, Fac Sci, 17 Listopadu 12, Olomouc 77146, Czech Republic
[2] Biomath & Stat Scotland, Edinburgh, Midlothian, Scotland
关键词
compositional data; high-throughput data; log-ratio analysis; marker discovery; PLS regression; METHANE EMISSIONS; ROUNDED ZEROS; REGRESSION; PACKAGE; MODEL;
D O I
10.1002/sam.11514
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-throughput data representing large mixtures of chemical or biological signals are ordinarily produced in the molecular sciences. Given a number of samples, partial least squares (PLS) regression is a well-established statistical method to investigate associations between them and any continuous response variables of interest. However, technical artifacts generally make the raw signals not directly comparable between samples. Thus, data normalization is required before any meaningful scientific information can be drawn. This often allows to characterize the processed signals as compositional data where the relevant information is contained in the pairwise log-ratios between the components of the mixture. The (log-ratio) pivot coordinate approach facilitates the aggregation into single variables of the pairwise log-ratios of a component to all the remaining components. This simplifies interpretability and the investigation of their relative importance but, particularly in a high-dimensional context, the aggregated log-ratios can easily mix up information from different underlaying processes. In this context, we propose a weighting strategy for the construction of pivot coordinates for PLS regression which draws on the correlation between response variable and pairwise log-ratios. Using real and simulated data sets, we demonstrate that this proposal enhances the discovery of biological markers in high-throughput compositional data.
引用
收藏
页码:315 / 330
页数:16
相关论文
共 36 条
  • [1] AITCHISON J, 1982, J ROY STAT SOC B, V44, P139
  • [2] Biplots of compositional data
    Aitchison, J
    Greenacre, M
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2002, 51 : 375 - 392
  • [3] Nuclear Magnetic Resonance to Detect Rumen Metabolites Associated with Enteric Methane Emissions from Beef Cattle
    Bica, R.
    Palarea-Albaladejo, J.
    Kew, W.
    Uhrin, D.
    Pacheco, D.
    Macrae, A.
    Dewhurst, R. J.
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [4] Spatio-temporal regression on compositional covariates: modeling vegetation in a gypsum outcrop
    Bruno, Francesca
    Greco, Fedele
    Ventrucci, Massimo
    [J]. ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2015, 22 (03) : 445 - 463
  • [5] Variation diagrams to statistically model the behavior of geochemical variables: Theory and applications
    Buccianti, A.
    Egozcue, J. J.
    Pawlowsky-Glahn, V.
    [J]. JOURNAL OF HYDROLOGY, 2014, 519 : 988 - 998
  • [6] Bylesjö M, 2009, COMPREHENSIVE CHEMOMETRICS: CHEMICAL AND BIOCHEMICAL DATA ANALYSIS, VOLS 1-4, pA109
  • [7] Multiple linear regression with compositional response and covariates
    Chen, Jiajia
    Zhang, Xiaoqin
    Li, Shengjia
    [J]. JOURNAL OF APPLIED STATISTICS, 2017, 44 (12) : 2270 - 2285
  • [8] Compositional data analysis for physical activity, sedentary time and sleep research
    Dumuid, Dorothea
    Stanford, Tyman E.
    Martin-Fernandez, Josep-Antoni
    Pedisic, Zeljko
    Maher, Carol A.
    Lewis, Lucy K.
    Hron, Karel
    Katzmarzyk, Peter T.
    Chaput, Jean-Philippe
    Fogelholm, Mikael
    Hu, Gang
    Lambert, Estelle V.
    Maia, Jose
    Sarmiento, Olga L.
    Standage, Martyn
    Barreira, Tiago V.
    Broyles, Stephanie T.
    Tudor-Locke, Catrine
    Tremblay, Mark S.
    Olds, Timothy
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (12) : 3726 - 3738
  • [9] Isometric logratio transformations for compositional data analysis
    Egozcue, JJ
    Pawlowsky-Glahn, V
    Mateu-Figueras, G
    Barceló-Vidal, C
    [J]. MATHEMATICAL GEOLOGY, 2003, 35 (03): : 279 - 300
  • [10] Everitt B, 2011, USE R, P1, DOI 10.1007/978-1-4419-9650-3