A High-Performance Computing Implementation of Iterative Random Forest for the Creation of Predictive Expression Networks

被引:24
|
作者
Cliff, Ashley [1 ,2 ]
Romero, Jonathon [1 ,2 ]
Kainer, David [2 ]
Walker, Angelica [1 ,2 ]
Furches, Anna [1 ,2 ]
Jacobson, Daniel [1 ,2 ]
机构
[1] Univ Tennessee, Bredesen Ctr Interdisciplinary Res & Grad Educ, Knoxville, TN 37996 USA
[2] Oak Ridge Natl Lab, POB 2009, Oak Ridge, TN 37830 USA
关键词
Random Forest; Iterative Random Forest; Gene Expression Networks; high-performance computing; X-AI-based eQTL;
D O I
10.3390/genes10120996
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As time progresses and technology improves, biological data sets are continuously increasing in size. New methods and new implementations of existing methods are needed to keep pace with this increase. In this paper, we present a high-performance computing (HPC)-capable implementation of Iterative Random Forest (iRF). This new implementation enables the explainable-AI eQTL analysis of SNP sets with over a million SNPs. Using this implementation, we also present a new method, iRF Leave One Out Prediction (iRF-LOOP), for the creation of Predictive Expression Networks on the order of 40,000 genes or more. We compare the new implementation of iRF with the previous R version and analyze its time to completion on two of the world's fastest supercomputers, Summit and Titan. We also show iRF-LOOP's ability to capture biologically significant results when creating Predictive Expression Networks. This new implementation of iRF will enable the analysis of biological data sets at scales that were previously not possible.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
    Walker, Angelica M.
    Cliff, Ashley
    Romero, Jonathon
    Shah, Manesh B.
    Jones, Piet
    Gazolla, Joao Gabriel Felipe Machado
    Jacobson, Daniel A.
    Kainer, David
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 3372 - 3386
  • [2] Predictive Analytics on Genomic Data with High-Performance Computing
    Leung, Carson K.
    Sarumi, Oluwafemi A.
    Zhang, Christine Y.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2187 - 2194
  • [3] Random forest implementation and optimization for Big Data analytics on LexisNexis’s high performance computing cluster platform
    Victor M. Herrera
    Taghi M. Khoshgoftaar
    Flavio Villanustre
    Borko Furht
    Journal of Big Data, 6
  • [4] Random forest implementation and optimization for Big Data analytics on LexisNexis's high performance computing cluster platform
    Herrera, Victor M.
    Khoshgoftaar, Taghi M.
    Villanustre, Flavio
    Furht, Borko
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [5] Integrating FPGAs in High-Performance Computing: The Architecture and Implementation Perspective
    Woods, Nathan
    FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 132 - 132
  • [6] Smart predictive maintenance for high-performance computing systems: a literature review
    Lima, Andre Luis da Cunha Dantas
    Aranha, Vitor Moraes
    Carvalho, Caio Jordao de Lima
    Nascimento, Erick Giovani Sperandio
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (11) : 13494 - 13513
  • [7] Predictive Simulation for Surface Fault Occurrence Using High-Performance Computing
    Sawada, Masataka
    Haba, Kazumoto
    Hori, Muneo
    GEOHAZARDS, 2022, 3 (01): : 88 - 105
  • [8] Smart predictive maintenance for high-performance computing systems: a literature review
    André Luis da Cunha Dantas Lima
    Vitor Moraes Aranha
    Caio Jordão de Lima Carvalho
    Erick Giovani Sperandio Nascimento
    The Journal of Supercomputing, 2021, 77 : 13494 - 13513
  • [9] Implementation and performance optimization of dynamic random forest
    Xu, Xiaolong
    Chen, Wen
    2017 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2017, : 283 - 289
  • [10] High-performance computing today
    Dongarra, J
    Meuer, H
    Simon, H
    Strohmaier, E
    FOUNDATIONS OF MOLECULAR MODELING AND SIMULATION, 2001, 97 (325): : 96 - 100