A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

被引:39
|
作者
Canela-Xandri, Oriol [1 ,2 ]
Law, Andy [1 ,2 ]
Gray, Alan [3 ]
Woolliams, John A. [1 ,2 ]
Tenesa, Albert [1 ,2 ,4 ]
机构
[1] Univ Edinburgh, Roslin Inst, Edinburgh EH25 9RG, Midlothian, Scotland
[2] Univ Edinburgh, Royal Dick Sch Vet Studies, Edinburgh EH25 9RG, Midlothian, Scotland
[3] Univ Edinburgh, EPCC, Edinburgh EH9 3FD, Midlothian, Scotland
[4] Univ Edinburgh, MRC IGMM, MRC HGU, Edinburgh EH4 2XU, Midlothian, Scotland
基金
英国医学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
AVERAGE INFORMATION REML; MIXED-MODEL ANALYSIS; GENETIC RISK; ASSOCIATION; PREDICTION; DISEASE; TRAITS; ACCURACY;
D O I
10.1038/ncomms10162
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in similar to 4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Ensemble-learning approach improves fracture prediction using genomic and phenotypic data
    Wu, Qing
    Jung, Jongyun
    OSTEOPOROSIS INTERNATIONAL, 2025, : 811 - 821
  • [22] Real-Time DDoS Attack Detection System Using Big Data Approach
    Awan, Mazhar Javed
    Farooq, Umar
    Babar, Hafiz Muhammad Aqeel
    Yasin, Awais
    Nobanee, Haitham
    Hussain, Muzammil
    Hakeem, Owais
    Zain, Azlan Mohd
    SUSTAINABILITY, 2021, 13 (19)
  • [23] Using Clinical Data Standards to Measure Quality: A New Approach
    D'Amore, John D.
    Li, Chun
    McCrary, Laura
    Niloff, Jonathan M.
    Sittig, Dean F.
    McCoy, Allison B.
    Wright, Adam
    APPLIED CLINICAL INFORMATICS, 2018, 9 (02): : 422 - 431
  • [24] A New Approach to Extrapolate Forest Attributes from Field Inventory with Satellite and Auxiliary Data Sets
    Huang, Shengli
    Ramirez, Carlos
    Kennedy, Kama
    Mallory, Jeffrey
    FOREST SCIENCE, 2017, 63 (02) : 232 - 240
  • [25] From DNA Sequences to Chemical Structures - Methods for Mining Microbial Genomic and Metagenomic Data Sets for New Natural Products
    Zucko, Jurica
    Starcevic, Antonio
    Diminic, Janko
    Elbekali, Mouhsine
    Lisfi, Mohamed
    Long, Paul F.
    Cullum, John
    Hranueli, Daslav
    FOOD TECHNOLOGY AND BIOTECHNOLOGY, 2010, 48 (02) : 234 - 242
  • [26] An Efficient Electricity Generation Forecasting System Using Artificial Neural Network Approach with Big Data
    Rahman, Mohammad Naimur
    Esmailpour, Amir
    2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 213 - 217
  • [27] Big data analysis using a parallel ensemble clustering architecture and an unsupervised feature selection approach
    Wang, Yubo
    Saraswat, Shelesh Krishna
    Komari, Iraj Elyasi
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (01) : 270 - 282
  • [28] Analysing LULC transformations using remote sensing data: insights from a multilayer perceptron neural network approach
    Hussain, Khadim
    Mehmood, Kaleem
    Yujun, Sun
    Badshah, Tariq
    Anees, Shoaib Ahmad
    Shahzad, Fahad
    Nooruddin
    Ali, Jamshid
    Bilal, Muhammad
    ANNALS OF GIS, 2024,
  • [29] Stability Lobe Diagrams Comparison of a Milling Tool Using Different Data Sets for the Modal Parameter's Estimation
    Almeida, Ubirata Sad
    Couto, Alison de Andrade
    Mateus, Gabriel Francisco Alves
    Duarte, Wallacy Rodrigues
    Pereira, Igor Cezar
    Guimaraes, Gustavo Paulinelli
    JOURNAL OF VIBRATION ENGINEERING & TECHNOLOGIES, 2023, 11 (08) : 4387 - 4393
  • [30] Efficient crop model parameter estimation and site characterization using large breeding trial data sets
    Lamsal, Abhishes
    Welch, S. M.
    Jones, J. W.
    Boote, K. J.
    Asebedo, Antonio
    Crain, Jared
    Wang, Xu
    Boyer, Will
    Giri, Anju
    Frink, Elizabeth
    Xu, Xuan
    Gundy, Garrison
    Arachchige, Pabodha Galgamuwe
    AGRICULTURAL SYSTEMS, 2017, 157 : 170 - 184