Analysis of -omics data: Graphical interpretation- and validation tools in multi-block methods

被引:39
作者
Hassani, Sahar [1 ,3 ]
Martens, Harald [1 ,2 ,3 ]
Qannari, El Mostafa [4 ]
Hanafi, Mohamed [4 ]
Borge, Grethe Iren [1 ]
Kohler, Achim [1 ]
机构
[1] Nofima Mat AS, Ctr Biospect & Data Modelling, N-1430 As, Norway
[2] Univ Life Sci, CIGENE Ctr Integrat Genet, N-1432 As, Norway
[3] Norwegian Univ Life Sci, Dept Math Sci & Technol IMT, N-1432 As, Norway
[4] ONIRIS, Unite Sensometrie & Chimiometrie, F-44322 Nantes 3, France
关键词
Omics data; Multi-block methods; Validation tools; MASS-SPECTROMETRY; SYSTEMS BIOLOGY; DRUG DISCOVERY; INFRARED-SPECTROSCOPY; ANOVA-PCA; METABOLOMICS; PROTEOMICS; GENOMICS; IMPACT; BIOMARKER;
D O I
10.1016/j.chemolab.2010.08.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As systems biology develops, various types of high-throughput -omics data become rapidly available. An increasing challenge is to analyze such massive data, interpret the results and validate the findings. Data analysis for most of the omics-techniques is in a fledgling immature stage. Alone the dimensionality of the data tables calls for new ways to reveal structure in the data, without cognitive overflow and excessive false discovery rate. Multi-block methods have been developed and adapted in order to find common variation patterns in data and depict these findings on graphical displays while providing tools to enhance the interpretation of the outcomes. In particular, multi-block methods based on latent variables are powerful tools to study block and global variation patterns, e.g. by inspecting block and global score plots. These methods can be used to achieve a graphical overview over sample and variable variation patterns in an efficient way. However, a visual detection of patterns may be subjective and, therefore, there is a need for validation tools. In this paper tools for validation of visually identified patterns in multi-block results are presented. Cross-validated estimates of Root Mean Square Error (RMSE) for block results are introduced for estimating the number of relevant PCs of the Consensus Principal Component Analysis (CPCA) models. Furthermore, important variables are identified by approximate t-tests based on Procrustes-corrected jackknifing. For the assessment of the stability of score patterns, block stability plots are introduced. Outliers can be revealed graphically on block and global level by stability plots. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:140 / 153
页数:14
相关论文
共 58 条
[1]   Systems biology: Its practice and challenges [J].
Aderem, A .
CELL, 2005, 121 (04) :511-513
[2]   Proteomics: applications in basic and applied biology [J].
Anderson, NL ;
Matheson, AD ;
Steiner, S .
CURRENT OPINION IN BIOTECHNOLOGY, 2000, 11 (04) :408-412
[3]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[4]   Multiblock latent root regression. Application to epidemiological data [J].
Bougeard, Stephanie ;
Hanafi, Mohamed ;
Qannari, El Mostafa .
COMPUTATIONAL STATISTICS, 2007, 22 (02) :209-222
[5]   The clinical application of proteomics [J].
Colantonio, DA ;
Chan, DW .
CLINICA CHIMICA ACTA, 2005, 357 (02) :151-158
[6]   Genomics and proteomics - The new millennium of drug discovery and development [J].
Cunningham, MJ .
JOURNAL OF PHARMACOLOGICAL AND TOXICOLOGICAL METHODS, 2000, 44 (01) :291-300
[7]   Impact of genomics on drug discovery and clinical medicine [J].
Emilien, G ;
Ponchon, M ;
Caldas, C ;
Isacson, O ;
Maloteaux, JM .
QJM-AN INTERNATIONAL JOURNAL OF MEDICINE, 2000, 93 (07) :391-423
[8]  
Færgestad EM, 2009, COMPREHENSIVE CHEMOMETRICS: CHEMICAL AND BIOCHEMICAL DATA ANALYSIS, VOLS 1-4, pC221
[9]   Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems) [J].
Fostel, Jennifer M. .
TOXICOLOGY AND APPLIED PHARMACOLOGY, 2008, 233 (01) :54-62
[10]   ANALYSIS OF MULTI-WAY (MULTI-MODE) DATA [J].
GELADI, P .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1989, 7 (1-2) :11-30