Preprocessing, classification modeling and feature selection using flow injection electrospray mass spectrometry metabolite fingerprint data

被引:79
作者
Enot, David P. [1 ]
Lin, Wanchang [1 ]
Beckmann, Manfred [1 ]
Parker, David [1 ]
Overy, David P. [1 ]
Draper, John [1 ]
机构
[1] Aberystwyth Univ, Inst Biol Sci, Aberystwyth SY23 3DA, Dyfed, Wales
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1038/nprot.2007.511
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Metabolome analysis by flow injection electrospray mass spectrometry (FIE-MS) fingerprinting generates measurements relating to large numbers of m/z signals. Such data sets often exhibit high variance with a paucity of replicates, thus providing a challenge for data mining. We describe data preprocessing and modeling methods that have proved reliable in projects involving samples from a range of organisms. The protocols interact with software resources specifically for metabolomics provided in a Web-accessible data analysis package FIEmspro (http://users.aber.ac.uk/jhd) written in the R environment and requiring a moderate knowledge of R command-line usage. Specific emphasis is placed on describing the outcome of modeling experiments using FIE-MS data that require further preprocessing to improve quality. The salient features of both poor and robust (i.e., highly generalizable) multivariate models are outlined together with advice on validating classifiers and avoiding false discovery when seeking explanatory variables.
引用
收藏
页码:446 / 470
页数:25
相关论文
共 69 条
  • [1] Aharoni Asaph, 2002, OMICS A Journal of Integrative Biology, V6, P217, DOI 10.1089/15362310260256882
  • [2] High-throughput classification of yeast mutants for functional genomics using metabolic footprinting
    Allen, J
    Davey, HM
    Broadhurst, D
    Heald, JK
    Rowland, JJ
    Oliver, SG
    Kell, DB
    [J]. NATURE BIOTECHNOLOGY, 2003, 21 (06) : 692 - 696
  • [3] High-throughput, nontargeted metabolite fingerprinting using nominal mass flow injection electrospray mass spectrometry
    Beckmann, Manfred
    Parker, David
    Enot, David P.
    Duval, Emilie
    Draper, John
    [J]. NATURE PROTOCOLS, 2008, 3 (03) : 486 - 504
  • [4] Representation, comparison, and interpretation of metabolome fingerprint data for total composition analysis and quality trait investigation in potato cultivars
    Beckmann, Manfred
    Enot, David P.
    Overy, David P.
    Draper, John
    [J]. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2007, 55 (09) : 3444 - 3451
  • [5] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [6] Avoiding model selection bias in small-sample genomic datasets
    Berrar, D
    Bradbury, I
    Dubitzky, W
    [J]. BIOINFORMATICS, 2006, 22 (10) : 1245 - 1250
  • [7] Potential of metabolomics as a functional genomics tool
    Bino, RJ
    Hall, RD
    Fiehn, O
    Kopka, J
    Saito, K
    Draper, J
    Nikolau, BJ
    Mendes, P
    Roessner-Tunali, U
    Beale, MH
    Trethewey, RN
    Lange, BM
    Wurtele, ES
    Sumner, LW
    [J]. TRENDS IN PLANT SCIENCE, 2004, 9 (09) : 418 - 425
  • [8] Is cross-validation valid for small-sample microarray classification?
    Braga-Neto, UM
    Dougherty, ER
    [J]. BIOINFORMATICS, 2004, 20 (03) : 374 - 380
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Statistical strategies for avoiding false discoveries in metabolomics and related experiments
    Broadhurst, David I.
    Kell, Douglas B.
    [J]. METABOLOMICS, 2006, 2 (04) : 171 - 196