Integration of multi-omics data for prediction of phenotypic traits using random forest

被引:70
作者
Acharjee, Animesh [1 ,3 ]
Kloosterman, Bjorn [1 ,2 ]
Visser, Richard G. F. [1 ]
Maliepaard, Chris [1 ]
机构
[1] Univ Wageningen & Res Ctr, Wageningen UR Plant Breeding, NL-6700 AJ Wageningen, Netherlands
[2] Keygene NV, POB 216, NL-6700 AE Wageningen, Netherlands
[3] MRC Human Nutr Res, 120 Fulbourn Rd, Cambridge CB1 9NL, England
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Data integration; Genetical genomics; Networks; Random forest; GENETIC GENOMICS; POTATO; EXPRESSION; QTL; RNA;
D O I
10.1186/s12859-016-1043-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these similar to omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated similar to omics variables that can predict, with higher accuracy, a quality trait of interest. Results: We used Random Forest regression for integrating multiple similar to omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine. Conclusions: In this study, we made a strategy for selecting and integrating multiple similar to omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Identification of ovarian cancer driver genes by using module network integration of multi-omics data
    Gevaert, Olivier
    Villalobos, Victor
    Sikic, Branimir I.
    Plevritis, Sylvia K.
    [J]. INTERFACE FOCUS, 2013, 3 (04)
  • [22] Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction
    Ranjbari, Sima
    Arslanturk, Suzan
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 147
  • [23] Integration of Multi-Omics Data Using Probabilistic Graph Models and External Knowledge
    Tripp, Bridget A.
    Otu, Hasan H.
    [J]. CURRENT BIOINFORMATICS, 2022, 17 (01) : 37 - 47
  • [24] A comprehensive survey of the approaches for pathway analysis using multi-omics data integration
    Maghsoudi, Zeynab
    Nguyen, Ha
    Tavakkoli, Alireza
    Nguyen, Tin
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (06)
  • [25] A multi-omics graph database for data integration and knowledge extraction
    Kim, Suyeon
    Thapa, Ishwor
    Ali, Hesham
    [J]. 13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [26] Intricacies of single-cell multi-omics data integration
    Rautenstrauch, Pia
    Vlot, Anna Hendrika Cornelia
    Saran, Sepideh
    Ohler, Uwe
    [J]. TRENDS IN GENETICS, 2022, 38 (02) : 128 - 139
  • [27] Integration strategies of multi-omics data for machine learning analysis
    Picard, Milan
    Scott-Boyer, Marie -Pier
    Bodein, Antoine
    Perin, Olivier
    Droit, Arnaud
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3735 - 3746
  • [28] Directional integration and pathway enrichment analysis for multi-omics data
    Slobodyanyuk, Mykhaylo
    Bahcheli, Alexander T.
    Klein, Zoe P.
    Bayati, Masroor
    Strug, Lisa J.
    Reimand, Juri
    [J]. NATURE COMMUNICATIONS, 2024, 15 (01)
  • [29] Benchmarking algorithms for single-cell multi-omics prediction and integration
    Hu, Yinlei
    Wan, Siyuan
    Luo, Yuanhanyu
    Li, Yuanzhe
    Wu, Tong
    Deng, Wentao
    Jiang, Chen
    Jiang, Shan
    Zhang, Yueping
    Liu, Nianping
    Yang, Zongcheng
    Chen, Falai
    Li, Bin
    Qu, Kun
    [J]. NATURE METHODS, 2024, 21 (11) : 2182 - +
  • [30] Evaluating the performance of multi-omics integration: a thyroid toxicity case study
    Canzler, Sebastian
    Schubert, Kristin
    Rolle-Kampczyk, Ulrike E.
    Wang, Zhipeng
    Schreiber, Stephan
    Seitz, Herve
    Mockly, Sophie
    Kamp, Hennicke
    Haake, Volker
    Huisinga, Maike
    Bergen, Martin von
    Buesen, Roland
    Hackermueller, Joerg
    [J]. ARCHIVES OF TOXICOLOGY, 2025, 99 (01) : 309 - 332