Machine learning-based analysis of multi-omics data on the cloud for investigating gene regulations

被引:30
作者
Oh, Minsik [2 ]
Park, Sungjoon [2 ]
Kim, Sun [3 ]
Chae, Heejoon [1 ]
机构
[1] Sookmyung Womens Univ, Div Comp Sci, Seoul 04310, South Korea
[2] Seoul Natl Univ, Comp Sci, Seoul, South Korea
[3] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
gene regulation; machine learning; cloud computing; multi-omics analysis; bioinformatics;
D O I
10.1093/bib/bbaa032
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Gene expressions are subtly regulated by quantifiable measures of genetic molecules such as interaction with other genes, methylation, mutations, transcription factor and histone modifications. Integrative analysis of multi-omics data can help scientists understand the condition or patient-specific gene regulation mechanisms. However, analysis of multi-omics data is challenging since it requires not only the analysis of multiple omics data sets but also mining complex relations among different genetic molecules by using state-of-the-art machine learning methods. In addition, analysis of multi-omics data needs quite large computing infrastructure. Moreover, interpretation of the analysis results requires collaboration among many scientists, often requiring reperforming analysis from different perspectives. Many of the aforementioned technical issues can be nicely handled when machine learning tools are deployed on the cloud. In this survey article, we first survey machine learning methods that can be used for gene regulation study, and we categorize them according to five different goals: gene regulatory subnetwork discovery, disease subtype analysis, survival analysis, clinical prediction and visualization. We also summarize the methods in terms of multi-omics input types. Then, we explain why the cloud is potentially a good solution for the analysis of multi-omics data, followed by a survey of two state-of-the-art cloud systems, Galaxy and BioVLAB. Finally, we discuss important issues when the cloud is used for the analysis of multi-omics data for the gene regulation study.
引用
收藏
页码:66 / 76
页数:11
相关论文
共 89 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update [J].
Afgan, Enis ;
Baker, Dannon ;
Batut, Berenice ;
van den Beek, Marius ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Gruening, Bjoern A. ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Hiltemann, Saskia ;
Jalili, Vahid ;
Rasche, Helena ;
Soranzo, Nicola ;
Goecks, Jeremy ;
Taylor, James ;
Nekrutenko, Anton ;
Blankenberg, Daniel .
NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) :W537-W544
[2]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update [J].
Afgan, Enis ;
Baker, Dannon ;
van den Beek, Marius ;
Blankenberg, Daniel ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Eberhard, Carl ;
Gruening, Bjoern ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Von Kuster, Greg ;
Rasche, Eric ;
Soranzo, Nicola ;
Turaga, Nitesh ;
Taylor, James ;
Nekrutenko, Anton ;
Goecks, Jeremy .
NUCLEIC ACIDS RESEARCH, 2016, 44 (W1) :W3-W10
[3]   Harnessing cloud computing with Galaxy Cloud [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Goto, Hiroki ;
Paul, Ian M. ;
Makova, Kateryna D. ;
Nekrutenko, Anton ;
Taylor, James .
NATURE BIOTECHNOLOGY, 2011, 29 (11) :972-974
[4]   Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering [J].
Ahmad, Ashar ;
Froehlich, Holger .
BIOINFORMATICS, 2017, 33 (22) :3558-3566
[5]   An Integrated Approach to Uncover Drivers of Cancer [J].
Akavia, Uri David ;
Litvin, Oren ;
Kim, Jessica ;
Sanchez-Garcia, Felix ;
Kotliar, Dylan ;
Causton, Helen C. ;
Pochanard, Panisa ;
Mozes, Eyal ;
Garraway, Levi A. ;
Pe'er, Dana .
CELL, 2010, 143 (06) :1005-1017
[6]   Identifying In-Trans Process Associated Genes in Breast Cancer by Integrated Analysis of Copy Number and Expression Data [J].
Aure, Miriam Ragle ;
Steinfeld, Israel ;
Baumbusch, Lars Oliver ;
Liestol, Knut ;
Lipson, Doron ;
Nyberg, Sandra ;
Naume, Bjorn ;
Sahlberg, Kristine Kleivi ;
Kristensen, Vessela N. ;
Borresen-Dale, Anne-Lise ;
Lingjaerde, Ole Christian ;
Yakhini, Zohar .
PLOS ONE, 2013, 8 (01)
[7]   Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma [J].
Beroukhim, Rameen ;
Getz, Gad ;
Nghiemphu, Leia ;
Barretina, Jordi ;
Hsueh, Teli ;
Linhart, David ;
Vivanco, Igor ;
Lee, Jeffrey C. ;
Huang, Julie H. ;
Alexander, Sethu ;
Du, Jinyan ;
Kau, Tweeny ;
Thomas, Roman K. ;
Shah, Kinial ;
Soto, Horacio ;
Perner, Sven ;
Prensner, John ;
Debiasi, Ralph M. ;
Demichelis, Francesca ;
Hatton, Charlie ;
Rubin, Mark A. ;
Garraway, Levi A. ;
Nelson, Stan F. ;
Liau, Linda ;
Mischel, Paul S. ;
Cloughesy, Tim F. ;
Meyerson, Matthew ;
Golub, Todd A. ;
Lander, Eric S. ;
Mellinghoff, Ingo K. ;
Sellers, William R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (50) :20007-20012
[8]   Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles [J].
Bertrand, Denis ;
Chng, Kern Rei ;
Sherbaf, Faranak Ghazi ;
Kiesel, Anja ;
Chia, Burton K. H. ;
Sia, Yee Yen ;
Huang, Sharon K. ;
Hoon, Dave S. B. ;
Liu, Edison T. ;
Hillmer, Axel ;
Nagarajan, Niranjan .
NUCLEIC ACIDS RESEARCH, 2015, 43 (07)
[9]   IPF-LASSO: Integrative L1-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data [J].
Boulesteix, Anne-Laure ;
De Bin, Riccardo ;
Jiang, Xiaoyu ;
Fuchs, Mathias .
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2017, 2017
[10]  
Buescher JM, 2016, CANCER METAB, V4, DOI 10.1186/s40170-016-0143-y