-Omics biomarker identification pipeline for translational medicine

被引:32
作者
Bravo-Merodio, Laura [1 ,2 ]
Williams, John A. [1 ,2 ,3 ]
Gkoutos, Georgios V. [1 ,2 ,4 ,5 ,6 ,7 ]
Acharjee, Animesh [1 ,2 ,6 ]
机构
[1] Univ Birmingham, Ctr Computat Biol, Inst Canc & Genom Sci, Coll Med & Dent Sci, Birmingham B15 2TT, W Midlands, England
[2] Univ Hosp Birmingham NHS Fdn Trust, Inst Translat Med, Birmingham B15 2TT, W Midlands, England
[3] MRC, Harwell Inst, Mammalian Genet Unit, Harwell Campus, Didcot OX11 0RD, Oxon, England
[4] MRC Hlth Data Res UK HDR UK, London, England
[5] NIHR Expt Canc Med Ctr, Birmingham B15 2TT, W Midlands, England
[6] NIHR Surg Reconstruct & Microbiol Res Ctr, Birmingham B15 2TT, W Midlands, England
[7] NIHR Biomed Res Ctr, Birmingham B15 2TT, W Midlands, England
基金
美国国家卫生研究院; 欧盟地平线“2020”; 美国国家科学基金会; 英国惠康基金;
关键词
Biomarker; -Omics; Regularization; Feature selection; Translational medicine; CELL; REGULARIZATION; ENCYCLOPEDIA; INTEGRATION; SELECTION; MODELS;
D O I
10.1186/s12967-019-1912-5
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundTranslational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of the disease state. Biomarker identification is one of the main challenges within TM. The identification of disease biomarkers from -omics data will not only help the stratification of diverse patient cohorts but will also provide early diagnostic information which could improve patient management and potentially prevent adverse outcomes. However, biomarker identification needs to be robust and reproducible. Hence a robust unbiased computational framework that can help clinicians identify those biomarkers is necessary.MethodsWe developed a pipeline (workflow) that includes two different supervised classification techniques based on regularization methods to identify biomarkers from -omics or other high dimension clinical datasets. The pipeline includes several important steps such as quality control and stability of selected biomarkers. The process takes input files (outcome and independent variables or -omics data) and pre-processes (normalization, missing values) them. After a random division of samples into training and test sets, Least Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods are applied to identify the most important features representing potential biomarker candidates. The penalization parameters are optimised using 10-fold cross validation and the process undergoes 100 iterations and a combinatorial analysis to select the best performing multivariate model. An empirical unbiased assessment of their quality as biomarkers for clinical use is performed through a Receiver Operating Characteristic curve and its Area Under the Curve analysis on both permuted and real data for 1000 different randomized training and test sets. We validated this pipeline against previously published biomarkers.ResultsWe applied this pipeline to three different datasets with previously published biomarkers: lipidomics data by Acharjee et al. (Metabolomics 13:25, 2017) and transcriptomics data by Rajamani and Bhasin (Genome Med 8:38, 2016) and Millset al. (Blood 114:1063-1072, 2009). Our results demonstrate that our method was able to identify both previously published biomarkers as well as new variables that add value to the published results.ConclusionsWe developed a robust pipeline to identify clinically relevant biomarkers that can be applied to different -omics datasets. Such identification reveals potentially novel drug targets and can be used as a part of a machine-learning based patient stratification framework in the translational medicine settings.
引用
收藏
页数:10
相关论文
共 45 条
  • [1] The translation of lipid profiles to nutritional biomarkers in the study of infant metabolism
    Acharjee, Animesh
    Prentice, Philippa
    Acerini, Carlo
    Smith, James
    Hughes, Ieuan A.
    Ong, Ken
    Griffin, Julian L.
    Dunger, David
    Koulman, Albert
    [J]. METABOLOMICS, 2017, 13 (03)
  • [2] Integration of metabolomics, lipidomics and clinical data using a machine learning method
    Acharjee, Animesh
    Ament, Zsuzsanna
    West, James A.
    Stanley, Elizabeth
    Griffin, Julian L.
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [3] Acharjee Animesh., 2013, Metabolomics, V3, P1, DOI DOI 10.4172/2153-0769.1000126
  • [4] [Anonymous], 1962, Chem. Eng. Progress.
  • [5] Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets
    Argelaguet, Ricard
    Velten, Britta
    Arnol, Damien
    Dietrich, Sascha
    Zenz, Thorsten
    Marioni, John C.
    Buettner, Florian
    Huber, Wolfgang
    Stegle, Oliver
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2018, 14 (06)
  • [6] Integration of multi-omics data and deep phenotyping enables prediction of cytokine responses
    Bakker, Olivier B.
    Aguirre-Gamboa, Raul
    Sanna, Serena
    Oosting, Marije
    Smeekens, Sanne P.
    Jaeger, Martin
    Zorro, Maria
    Vosa, Urmo
    Withoff, Sebo
    Netea-Maier, Romana T.
    Koenen, Hans J. P. M.
    Joosten, Irma
    Xavier, Ramnik J.
    Franke, Lude
    Joosten, Leo A. B.
    Kumar, Vinod
    Wijmenga, Cisca
    Netea, Mihai G.
    Li, Yang
    [J]. NATURE IMMUNOLOGY, 2018, 19 (07) : 776 - +
  • [7] The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
    Barretina, Jordi
    Caponigro, Giordano
    Stransky, Nicolas
    Venkatesan, Kavitha
    Margolin, Adam A.
    Kim, Sungjoon
    Wilson, Christopher J.
    Lehar, Joseph
    Kryukov, Gregory V.
    Sonkin, Dmitriy
    Reddy, Anupama
    Liu, Manway
    Murray, Lauren
    Berger, Michael F.
    Monahan, John E.
    Morais, Paula
    Meltzer, Jodi
    Korejwa, Adam
    Jane-Valbuena, Judit
    Mapa, Felipa A.
    Thibault, Joseph
    Bric-Furlong, Eva
    Raman, Pichai
    Shipway, Aaron
    Engels, Ingo H.
    Cheng, Jill
    Yu, Guoying K.
    Yu, Jianjun
    Aspesi, Peter, Jr.
    de Silva, Melanie
    Jagtap, Kalpana
    Jones, Michael D.
    Wang, Li
    Hatton, Charles
    Palescandolo, Emanuele
    Gupta, Supriya
    Mahan, Scott
    Sougnez, Carrie
    Onofrio, Robert C.
    Liefeld, Ted
    MacConaill, Laura
    Winckler, Wendy
    Reich, Michael
    Li, Nanxin
    Mesirov, Jill P.
    Gabriel, Stacey B.
    Getz, Gad
    Ardlie, Kristin
    Chan, Vivien
    Myer, Vic E.
    [J]. NATURE, 2012, 483 (7391) : 603 - 607
  • [8] Enrichr: interactive and collaborative HTML']HTML5 gene list enrichment analysis tool
    Chen, Edward Y.
    Tan, Christopher M.
    Kou, Yan
    Duan, Qiaonan
    Wang, Zichen
    Meirelles, Gabriela Vaz
    Clark, Neil R.
    Ma'ayan, Avi
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [9] The properties of high-dimensional data spaces: implications for exploring gene and protein expression data
    Clarke, Robert
    Ressom, Habtom W.
    Wang, Antai
    Xuan, Jianhua
    Liu, Minetta C.
    Gehan, Edmund A.
    Wang, Yue
    [J]. NATURE REVIEWS CANCER, 2008, 8 (01) : 37 - 49
  • [10] Lost in Translation-Basic Science in the Era of Translational Research
    Fang, Ferric C.
    Casadevall, Arturo
    [J]. INFECTION AND IMMUNITY, 2010, 78 (02) : 563 - 566