EBIC: an open source software for high-dimensional and big data analyses

被引:8
|
作者
Orzechowski, Patryk [1 ,2 ]
Moore, Jason H. [1 ]
机构
[1] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[2] AGH Univ Sci & Technol, Dept Automat & Robot, PL-30059 Krakow, Poland
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btz027
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In this paper, we present an open source package with the latest release of Evolutionary-based BIClustering (EBIC), a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding a full support for multiple graphics processing units (GPUs) support, which makes it possible to run efficiently large genomic data mining analyses. Multiple enhancements to the first release of the algorithm include integration with R and Bioconductor, and an option to exclude missing values from the analysis. Results: Evolutionary-based BIClustering was applied to datasets of different sizes, including a large DNA methylation dataset with 436 444 rows. For the largest dataset we observed over 6.6-fold speedup in computation time on a cluster of eight GPUs compared to running the method on a single GPU. This proves high scalability of the method.
引用
收藏
页码:3181 / 3183
页数:3
相关论文
共 50 条
  • [1] Comparison of Big Data Analyses for Reliable Open Source Software
    Tamura, Yoshinobu
    Yamada, Shigeru
    2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2016, : 1345 - 1349
  • [2] A study of software reliability on big data open source software
    Kumar, Ranjan
    Kumar, Subhash
    Tiwari, Sanjay K.
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2019, 10 (02) : 242 - 250
  • [3] A study of software reliability on big data open source software
    Ranjan Kumar
    Subhash Kumar
    Sanjay K. Tiwari
    International Journal of System Assurance Engineering and Management, 2019, 10 : 242 - 250
  • [4] SPOT: Open source visual data analytics platform for high-dimensional scientific data
    Diblen, F.
    Attema, J. J.
    Bakhshi, R.
    Stienen, B.
    Hendriks, L.
    Caron, S.
    2018 IEEE 14TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE 2018), 2018, : 411 - 411
  • [5] A High-Dimensional Indexing Model for Multi-Source Remote Sensing Big Data
    Zhu, Lilu
    Su, Xiaolu
    Tai, Xianqing
    REMOTE SENSING, 2021, 13 (07)
  • [6] Industrial Big Data Platform Based on Open Source Software
    Yang, Wen
    Haider, Syed Naeem
    Zou, Jian-hong
    Zhao, Qian-chuan
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGY (CNCT 2016), 2016, 54 : 649 - 658
  • [7] DEIMoS: An Open-Source Tool for Processing High-Dimensional Mass Spectrometry Data
    Colby, Sean M.
    Chang, Christine H.
    Bade, Jessica L.
    Nunez, Jamie R.
    Blumer, Madison R.
    Orton, Daniel J.
    Bloodsworth, Kent J.
    Nakayasu, Ernesto S.
    Smith, Richard D.
    Ibrahim, Yehia M.
    Renslow, Ryan S.
    Metz, Thomas O.
    ANALYTICAL CHEMISTRY, 2022, 94 (16) : 6130 - 6138
  • [8] The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data
    Yawei He
    Zehua Chen
    Annals of the Institute of Statistical Mathematics, 2016, 68 : 155 - 180
  • [9] Challenges and opportunities in high-dimensional choice data analyses
    Naik, Prasad
    Wedel, Michel
    Bacon, Lynd
    Bodapati, Anand
    Bradlow, Eric
    Kamakura, Wagner
    Kreulen, Jeffrey
    Lenk, Peter
    Madigan, David M.
    Montgomery, Alan
    MARKETING LETTERS, 2008, 19 (3-4) : 201 - 213
  • [10] The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data
    He, Yawei
    Chen, Zehua
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2016, 68 (01) : 155 - 180