A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

被引:39
|
作者
Canela-Xandri, Oriol [1 ,2 ]
Law, Andy [1 ,2 ]
Gray, Alan [3 ]
Woolliams, John A. [1 ,2 ]
Tenesa, Albert [1 ,2 ,4 ]
机构
[1] Univ Edinburgh, Roslin Inst, Edinburgh EH25 9RG, Midlothian, Scotland
[2] Univ Edinburgh, Royal Dick Sch Vet Studies, Edinburgh EH25 9RG, Midlothian, Scotland
[3] Univ Edinburgh, EPCC, Edinburgh EH9 3FD, Midlothian, Scotland
[4] Univ Edinburgh, MRC IGMM, MRC HGU, Edinburgh EH4 2XU, Midlothian, Scotland
基金
英国医学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
AVERAGE INFORMATION REML; MIXED-MODEL ANALYSIS; GENETIC RISK; ASSOCIATION; PREDICTION; DISEASE; TRAITS; ACCURACY;
D O I
10.1038/ncomms10162
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in similar to 4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets
    Bassel, George W.
    Glaab, Enrico
    Marquez, Julietta
    Holdsworth, Michael J.
    Bacardit, Jaume
    PLANT CELL, 2011, 23 (09) : 3101 - 3116
  • [42] Assembly quality evaluation for linear axis of machine tool using data-driven modeling approach
    Hui, Yang
    Mei, Xuesong
    Jiang, Gedong
    Zhao, Fei
    Ma, Ziwei
    Tao, Tao
    JOURNAL OF INTELLIGENT MANUFACTURING, 2022, 33 (03) : 753 - 769
  • [43] Cultivating historical heritage area vitality using urban morphology approach based on big data and machine learning
    Wu, Jiayu
    Lu, Yutian
    Gao, Hei
    Wang, Mingshu
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2022, 91
  • [44] SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory
    Ramentol, Enislay
    Caballero, Yaile
    Bello, Rafael
    Herrera, Francisco
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (02) : 245 - 265
  • [45] Using a machine learning approach and big data to augment WASDE forecasts: Empirical evidence from US corn yield
    Roznik, Mitchell
    Mishra, Ashok K. K.
    Boyd, Milton S. S.
    JOURNAL OF FORECASTING, 2023, 42 (06) : 1370 - 1384
  • [46] Income Inequality and Health: Expanding Our Understanding of State-Level Effects by Using a Geospatial Big Data Approach
    Haithcoat, Timothy L.
    Avery, Eileen E.
    Bowers, Kelly A.
    Hammer, Richard D.
    Shyu, Chi-Ren
    SOCIAL SCIENCE COMPUTER REVIEW, 2021, 39 (04) : 543 - 561
  • [47] Developing QSAR Models with Defined Applicability Domains on PPARγ Binding Affinity Using Large Data Sets and Machine Learning Algorithms
    Wang, Zhongyu
    Chen, Jingwen
    Hong, Huixiao
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2021, 55 (10) : 6857 - 6866
  • [48] A new machine learning approach for estimating shear wave velocity profile using borelog data
    Joshi, Anushka
    Raman, Balasubramanian
    Mohan, C. Krishna
    Cenkeramaddi, Linga Reddy
    SOIL DYNAMICS AND EARTHQUAKE ENGINEERING, 2024, 177
  • [49] Wind Turbine Bearing Temperature Forecasting Using a New Data-Driven Ensemble Approach
    Yan, Guangxi
    Yu, Chengqing
    Bai, Yu
    MACHINES, 2021, 9 (11)
  • [50] Effective crude oil price forecasting using new text-based and big-data-driven model
    Wu, Binrong
    Wang, Lin
    Lv, Sheng-Xiang
    Zeng, Yu-Rong
    MEASUREMENT, 2021, 168 (168)