Gene-gene interaction: the curse of dimensionality

被引:30
|
作者
Chattopadhyay, Amrita [1 ]
Lu, Tzu-Pin [1 ]
机构
[1] Natl Taiwan Univ, Inst Epidemiol & Prevent Med, Dept Publ Hlth, Taipei, Taiwan
关键词
Gene-gene interaction; parallel computing; PySpark; deep-learning (DL); machine-learning (ML); multifactor dimensionality reduction (MDR); MISSING HERITABILITY; REDUCTION METHOD; NEURAL-NETWORKS; EPISTASIS;
D O I
10.21037/atm.2019.12.87
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one-dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Pair-Wise Multifactor Dimensionality Reduction Method to Detect Gene-Gene Interactions in A Case-Control Study
    He, H.
    Oetting, W. S.
    Brott, M. J.
    Basu, S.
    HUMAN HEREDITY, 2010, 69 (01) : 60 - 70
  • [42] Fuzzy set-based generalized multifactor dimensionality reduction analysis of gene-gene interactions
    Hye-Young Jung
    Sangseob Leem
    Taesung Park
    BMC Medical Genomics, 11
  • [43] Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR)
    Rishika De
    Shefali S. Verma
    Fotios Drenos
    Emily R. Holzinger
    Michael V. Holmes
    Molly A. Hall
    David R. Crosslin
    David S. Carrell
    Hakon Hakonarson
    Gail Jarvik
    Eric Larson
    Jennifer A. Pacheco
    Laura J. Rasmussen-Torvik
    Carrie B. Moore
    Folkert W. Asselbergs
    Jason H. Moore
    Marylyn D. Ritchie
    Brendan J. Keating
    Diane Gilbert-Diamond
    BioData Mining, 8
  • [44] Fuzzy set-based generalized multifactor dimensionality reduction analysis of gene-gene interactions
    Jung, Hye-Young
    Leem, Sangseob
    Park, Taesung
    BMC MEDICAL GENOMICS, 2018, 11
  • [45] Jackknife-based gene-gene interaction tests for untyped SNPs
    Song, Minsun
    BMC GENETICS, 2015, 16
  • [46] Gene-gene interaction analysis for the survival phenotype based on the Cox model
    Lee, Seungyeoun
    Kwon, Min-Seok
    Oh, Jung Mi
    Park, Taesung
    BIOINFORMATICS, 2012, 28 (18) : I582 - I588
  • [47] Gene-gene interactions in gastrointestinal cancer susceptibility
    Kim, Jineun
    Yum, Seoyun
    Kang, Changwon
    Kang, Suk-Jo
    ONCOTARGET, 2016, 7 (41) : 67612 - 67625
  • [48] Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests
    de Andrade, Mariza
    Wang, Xin
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
  • [49] Allelic Based Gene-Gene Interaction in Case-Control Studies
    Jung, Jeesun
    Zhao, Yiqiang
    HUMAN HEREDITY, 2010, 69 (01) : 14 - 27
  • [50] Robust Gene-Gene Interaction Analysis in Genome Wide Association Studies
    Kim, Yongkang
    Park, Taesung
    PLOS ONE, 2015, 10 (08):