Integration of multi-omics data for prediction of phenotypic traits using random forest

被引:69
|
作者
Acharjee, Animesh [1 ,3 ]
Kloosterman, Bjorn [1 ,2 ]
Visser, Richard G. F. [1 ]
Maliepaard, Chris [1 ]
机构
[1] Univ Wageningen & Res Ctr, Wageningen UR Plant Breeding, NL-6700 AJ Wageningen, Netherlands
[2] Keygene NV, POB 216, NL-6700 AE Wageningen, Netherlands
[3] MRC Human Nutr Res, 120 Fulbourn Rd, Cambridge CB1 9NL, England
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Data integration; Genetical genomics; Networks; Random forest; GENETIC GENOMICS; POTATO; EXPRESSION; QTL; RNA;
D O I
10.1186/s12859-016-1043-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these similar to omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated similar to omics variables that can predict, with higher accuracy, a quality trait of interest. Results: We used Random Forest regression for integrating multiple similar to omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine. Conclusions: In this study, we made a strategy for selecting and integrating multiple similar to omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Integration of multi-omics data for prediction of phenotypic traits using random forest
    Animesh Acharjee
    Bjorn Kloosterman
    Richard G. F. Visser
    Chris Maliepaard
    BMC Bioinformatics, 17
  • [2] Data integration and network reconstruction with ∼omics data using Random Forest regression in potato
    Acharjee, Animesh
    Kloosterman, Bjorn
    de Vos, Ric C. H.
    Werij, Jeroen S.
    Bachem, Christian W. B.
    Visser, Richard G. F.
    Maliepaard, Chris
    ANALYTICA CHIMICA ACTA, 2011, 705 (1-2) : 56 - 63
  • [3] Early prediction of preeclampsia from clinical, multi-omics and laboratory data using random forest model
    Qiang Zhao
    Jia Li
    Zhuo Diao
    Xiao Zhang
    Suihua Feng
    Guixue Hou
    Wenqiu Xu
    Zhiguang Zhao
    Zhixu Qiu
    Wenzhi Yang
    Si Zhou
    Peirun Tian
    Qun Zhang
    Weiping Chen
    Huahua Li
    Gefei Xiao
    Jie Qin
    Liqing Hu
    Zhongzhe Li
    Liang Lin
    Shunyao Wang
    Ruyun Gao
    Wuyan Huang
    Xiaohong Ruan
    Sufen Zhang
    Jianguo Zhang
    Lijian Zhao
    Rui Zhang
    BMC Pregnancy and Childbirth, 25 (1)
  • [4] Survey on Multi-omics, and Multi-omics Data Analysis, Integration and Application
    Shahrajabian, Mohamad Hesam
    Sun, Wenli
    CURRENT PHARMACEUTICAL ANALYSIS, 2023, 19 (04) : 267 - 281
  • [5] Methods for the integration of multi-omics data: mathematical aspects
    Bersanelli, Matteo
    Mosca, Ettore
    Remondini, Daniel
    Giampieri, Enrico
    Sala, Claudia
    Castellani, Gastone
    Milanesi, Luciano
    BMC BIOINFORMATICS, 2016, 17
  • [6] Prospects and challenges of multi-omics data integration in toxicology
    Canzler, Sebastian
    Schor, Jana
    Busch, Wibke
    Schubert, Kristin
    Rolle-Kampczyk, Ulrike E.
    Seitz, Herve
    Kamp, Hennicke
    von Bergen, Martin
    Buesen, Roland
    Hackermueller, Joerg
    ARCHIVES OF TOXICOLOGY, 2020, 94 (02) : 371 - 388
  • [7] Integration of multi-omics data accelerates molecular analysis of common wheat traits
    Zhang, Ning
    Tang, Li
    Li, Songgang
    Liu, Lu
    Gao, Mengjuan
    Wang, Sisheng
    Chen, Daiying
    Zhao, Yichao
    Zheng, Ruiqing
    Soleymaniniya, Armin
    Zhang, Lingran
    Wang, Wenkang
    Yang, Xia
    Ren, Yan
    Sun, Congwei
    Wilhelm, Mathias
    Wang, Daowen
    Li, Min
    Chen, Feng
    NATURE COMMUNICATIONS, 2025, 16 (01)
  • [8] A roadmap for multi-omics data integration using deep learning
    Kang, Mingon
    Ko, Euiseong
    Mersha, Tesfaye B.
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [9] Progress in single-cell multimodal sequencing and multi-omics data integration
    Wang, Xuefei
    Wu, Xinchao
    Hong, Ni
    Jin, Wenfei
    BIOPHYSICAL REVIEWS, 2024, 16 (01) : 13 - 28
  • [10] Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets
    Argelaguet, Ricard
    Velten, Britta
    Arnol, Damien
    Dietrich, Sascha
    Zenz, Thorsten
    Marioni, John C.
    Buettner, Florian
    Huber, Wolfgang
    Stegle, Oliver
    MOLECULAR SYSTEMS BIOLOGY, 2018, 14 (06)