A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

被引:39
|
作者
Canela-Xandri, Oriol [1 ,2 ]
Law, Andy [1 ,2 ]
Gray, Alan [3 ]
Woolliams, John A. [1 ,2 ]
Tenesa, Albert [1 ,2 ,4 ]
机构
[1] Univ Edinburgh, Roslin Inst, Edinburgh EH25 9RG, Midlothian, Scotland
[2] Univ Edinburgh, Royal Dick Sch Vet Studies, Edinburgh EH25 9RG, Midlothian, Scotland
[3] Univ Edinburgh, EPCC, Edinburgh EH9 3FD, Midlothian, Scotland
[4] Univ Edinburgh, MRC IGMM, MRC HGU, Edinburgh EH4 2XU, Midlothian, Scotland
基金
英国医学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
AVERAGE INFORMATION REML; MIXED-MODEL ANALYSIS; GENETIC RISK; ASSOCIATION; PREDICTION; DISEASE; TRAITS; ACCURACY;
D O I
10.1038/ncomms10162
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in similar to 4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A fast genomic selection approach for large genomic data
    Liu, Hailan
    Chen, Guo-Bo
    THEORETICAL AND APPLIED GENETICS, 2017, 130 (06) : 1277 - 1284
  • [2] Evaluation of a new local modelling approach for large and heterogeneous NIRS data sets
    Zamora-Rojas, E.
    Garrido-Varo, A.
    Van den Berg, F.
    Guerrero-Ginel, J. E.
    Perez-Marin, D. C.
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2010, 101 (02) : 87 - 94
  • [3] BREEDING AND GENETICS SYMPOSIUM: Really big data: Processing and analysis of very large data sets
    Cole, J. B.
    Newman, S.
    Foertter, F.
    Aguilar, I.
    Coffey, M.
    JOURNAL OF ANIMAL SCIENCE, 2012, 90 (03) : 723 - 733
  • [4] SubmiRine: assessing variants in microRNA targets using clinical genomic data sets
    Maxwell, Evan K.
    Campbell, Joshua D.
    Spira, Avrum
    Baxevanis, Andreas D.
    NUCLEIC ACIDS RESEARCH, 2015, 43 (08) : 3886 - 3898
  • [5] Turning Vice into Virtue: Using Batch-Effects to Detect Errors in Large Genomic Data Sets
    Mafessoni, Fabrizio
    Prasad, Rashmi B.
    Groop, Leif
    Hansson, Ola
    Pruefer, Kay
    GENOME BIOLOGY AND EVOLUTION, 2018, 10 (10): : 2697 - 2708
  • [6] A New Approach for Wrapper Feature Selection Using Genetic Algorithm for Big Data
    Bouaguel, Waad
    INTELLIGENT AND EVOLUTIONARY SYSTEMS, IES 2015, 2016, 5 : 75 - 83
  • [7] Improved Genetic Profiling of Anthropometric Traits Using a Big Data Approach
    Canela-Xandri, Oriol
    Rawlik, Konrad
    Woolliams, John A.
    Tenesa, Albert
    PLOS ONE, 2016, 11 (12):
  • [8] Cricket Match Analytics Using the Big Data Approach
    Awan, Mazhar Javed
    Gilani, Syed Arbaz Haider
    Ramzan, Hamza
    Nobanee, Haitham
    Yasin, Awais
    Zain, Azlan Mohd
    Javed, Rabia
    ELECTRONICS, 2021, 10 (19)
  • [9] A calibrated data-driven approach for small area estimation using big data
    Tam, Siu-Ming
    Sharmeen, Shaila
    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2024, 66 (02) : 125 - 145
  • [10] Evaluation of a two-stage framework for prediction using big genomic data
    Jiang, Xia
    Neapolitan, Richard E.
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (06) : 912 - 921