Current breathomics-a review on data pre-processing techniques and machine learning in metabolomics breath analysis

被引:154
作者
Smolinska, A. [1 ,2 ]
Hauschild, A-Ch [3 ]
Fijten, R. R. R. [1 ]
Dallinga, J. W. [1 ]
Baumbach, J. [4 ]
van Schooten, F. J. [1 ]
机构
[1] Maastricht Univ, Nutr & Toxicol Res Inst Maastricht NUTRIM, Dept Toxicol, NL-6200 MD Maastricht, Netherlands
[2] Top Inst Food & Nutr, Wageningen, Netherlands
[3] Max Planck Inst Informat, Computat Syst Biol Grp, D-66123 Saarbrucken, Germany
[4] Univ Southern Denmark, Dept Math & Comp Sci, Computat Biol Grp, Odense, Denmark
关键词
GC-MS; MCC-IMS; exhaled air; multivariate analysis; volatile organic compounds (VOCs); VOLATILE ORGANIC-COMPOUNDS; FLIGHT MASS-SPECTROMETER; EXHALED BREATH; LUNG-CANCER; BIOLOGICAL DATA; VARIABLE SELECTION; RETENTION TIME; BIOMARKERS; PEAK; NOSE;
D O I
10.1088/1752-7155/8/2/027105
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We define breathomics as the metabolomics study of exhaled air. It is a strongly emerging metabolomics research field that mainly focuses on health-related volatile organic compounds (VOCs). Since the amount of these compounds varies with health status, breathomics holds great promise to deliver non-invasive diagnostic tools. Thus, the main aim of breathomics is to find patterns of VOCs related to abnormal (for instance inflammatory) metabolic processes occurring in the human body. Recently, analytical methods for measuring VOCs in exhaled air with high resolution and high throughput have been extensively developed. Yet, the application of machine learning methods for fingerprinting VOC profiles in the breathomics is still in its infancy. Therefore, in this paper, we describe the current state of the art in data pre-processing and multivariate analysis of breathomics data. We start with the detailed pre-processing pipelines for breathomics data obtained from gas-chromatography mass spectrometry and an ion-mobility spectrometer coupled to multi-capillary columns. The outcome of data pre-processing is a matrix containing the relative abundances of a set of VOCs for a group of patients under different conditions (e.g. disease stage, treatment). Independently of the utilized analytical method, the most important question, 'which VOCs are discriminatory?', remains the same. Answers can be given by several modern machine learning techniques (multivariate statistics) and, therefore, are the focus of this paper. We demonstrate the advantages as well the drawbacks of such techniques. We aim to help the community to understand how to profit from a particular method. In parallel, we hope to make the community aware of the existing data fusion methods, as yet unresearched in breathomics.
引用
收藏
页数:20
相关论文
共 130 条
  • [1] Exhaled volatile organic compounds identify patients with colorectal cancer
    Altomare, D. F.
    Di Lena, M.
    Porcelli, F.
    Trizio, L.
    Travaglio, E.
    Tutino, M.
    Dragonieri, S.
    Memeo, V.
    de Gennaro, G.
    [J]. BRITISH JOURNAL OF SURGERY, 2013, 100 (01) : 144 - 151
  • [2] A prediction model for COPD readmissions: catching up, catching our breath, and improving a national problem
    Amalakuhan, Bravein
    Kiljanek, Lukasz
    Parvathaneni, Arvin
    Hester, Michael
    Cheriyath, Pramil
    Fischman, Daniel
    [J]. JOURNAL OF COMMUNITY HOSPITAL INTERNAL MEDICINE PERSPECTIVES, 2012, 2 (01):
  • [3] Artificial neural networks in medical diagnosis
    Amato, Filippo
    Lopez, Alberto
    Pena-Mendez, Eladia Maria
    Vanhara, Petr
    Hampl, Ales
    Havel, Josef
    [J]. JOURNAL OF APPLIED BIOMEDICINE, 2013, 11 (02) : 47 - 58
  • [4] Reducing over-optimism in variable selection by cross-model validation
    Anderssen, Endre
    Dyrstad, Knut
    Westad, Frank
    Martens, Harald
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2006, 84 (1-2) : 69 - 74
  • [5] [Anonymous], 1988, Principles of Multivariate Analysis
  • [6] [Anonymous], 1973, Pattern Classification and Scene Analysis
  • [7] [Anonymous], 1998, HDB CHEMOMETRICS Q A
  • [8] Bach FR, 2008, J MACH LEARN RES, V9, P1179
  • [9] Reduction of ion mobility spectrometry data by clustering characteristic peak structures
    Bader, Sabine
    Urfer, Wolfgang
    Baumbach, Jorg Ingo
    [J]. JOURNAL OF CHEMOMETRICS, 2006, 20 (3-4) : 128 - 135
  • [10] Bader Sabine., 2008, Identification and Quantification of Peaks in Spectrometric Data