Gene-gene interaction: the curse of dimensionality

被引:30
|
作者
Chattopadhyay, Amrita [1 ]
Lu, Tzu-Pin [1 ]
机构
[1] Natl Taiwan Univ, Inst Epidemiol & Prevent Med, Dept Publ Hlth, Taipei, Taiwan
关键词
Gene-gene interaction; parallel computing; PySpark; deep-learning (DL); machine-learning (ML); multifactor dimensionality reduction (MDR); MISSING HERITABILITY; REDUCTION METHOD; NEURAL-NETWORKS; EPISTASIS;
D O I
10.21037/atm.2019.12.87
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one-dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] A New Correction for Multiple Testing in Gene-Gene Interaction Studies
    Babron, Marie-Claude
    Etcheto, Adrien
    Dizier, Marie-Helene
    ANNALS OF HUMAN GENETICS, 2015, 79 (05) : 380 - 384
  • [32] SVM-Based Generalized Multifactor Dimensionality Reduction Approaches for Detecting Gene-Gene Interactions in Family Studies
    Fang, Yao-Hwei
    Chiu, Yen-Feng
    GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 88 - 98
  • [33] Weighted Risk Score-Based Multifactor Dimensionality Reduction to Detect Gene-Gene Interactions in Nasopharyngeal Carcinoma
    Li, Chao-Feng
    Luo, Fu-Tian
    Zeng, Yi-Xin
    Jia, Wei-Hua
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2014, 15 (06) : 10724 - 10737
  • [34] Detecting, Characterizing, and Interpreting Nonlinear Gene-Gene Interactions Using Multifactor Dimensionality Reduction
    Moore, Jason H.
    COMPUTATIONAL METHODS FOR GENETICS OF COMPLEX TRAITS, 2010, 72 : 101 - 116
  • [35] Genetic ancestry modifies pharmacogenetic gene-gene interaction for asthma
    Corvol, Harriet
    De Giacomo, Anthony
    Eng, Celeste
    Seibold, Max
    Ziv, Elad
    Chapela, Rocio
    Rodriguez-Santana, Jose R.
    Rodriguez-Cintron, William
    Thyne, Shannon
    Watson, H. Geoffrey
    Meade, Kelley
    LeNoir, Michael
    Avila, Pedro C.
    Choudhry, Shweta
    Burchard, Esteban Gonzalez
    PHARMACOGENETICS AND GENOMICS, 2009, 19 (07) : 489 - 496
  • [36] A support vector machine approach for detecting gene-gene interaction
    Chen, Shyh-Huei
    Sun, Jielin
    Dimitrov, Latchezar
    Turner, Aubrey R.
    Adams, Tamara S.
    Meyers, Deborah A.
    Chang, Bao-Li
    Zheng, S. Lilly
    Groenberg, Henrik
    Xu, Jianfeng
    Hsu, Fang-Chi
    GENETIC EPIDEMIOLOGY, 2008, 32 (02) : 152 - 167
  • [37] Restricted Parameter Space Models for Testing Gene-Gene Interaction
    Song, Minsun
    Nicolae, Dan L.
    GENETIC EPIDEMIOLOGY, 2009, 33 (05) : 386 - 393
  • [38] Interaction Trees: Optimizing Ensembles of Decision Trees for Gene-Gene Interaction Detections
    Assareh, Amin
    Volkert, L. Gwenn
    Li, Jing
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 616 - 621
  • [39] Travelling the world of gene-gene interactions
    Van Steen, Kristel
    BRIEFINGS IN BIOINFORMATICS, 2012, 13 (01) : 1 - 19
  • [40] A Kernel Regression Approach to Gene-Gene Interaction Detection for Case-Control Studies
    Larson, Nicholas B.
    Schaid, Daniel J.
    GENETIC EPIDEMIOLOGY, 2013, 37 (07) : 695 - 703