Gene-gene interaction: the curse of dimensionality

被引:30
|
作者
Chattopadhyay, Amrita [1 ]
Lu, Tzu-Pin [1 ]
机构
[1] Natl Taiwan Univ, Inst Epidemiol & Prevent Med, Dept Publ Hlth, Taipei, Taiwan
关键词
Gene-gene interaction; parallel computing; PySpark; deep-learning (DL); machine-learning (ML); multifactor dimensionality reduction (MDR); MISSING HERITABILITY; REDUCTION METHOD; NEURAL-NETWORKS; EPISTASIS;
D O I
10.21037/atm.2019.12.87
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one-dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Multifactor dimensionality reduction for detecting gene-gene and gene-environment interactions in pharmacogenomics studies
    Ritchie, MD
    Motsinger, AA
    PHARMACOGENOMICS, 2005, 6 (08) : 823 - 834
  • [22] An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions
    Leem, Sangseob
    Park, Taesung
    BMC GENOMICS, 2017, 18
  • [23] Spatial rank-based multifactor dimensionality reduction to detect gene-gene interactions for multivariate phenotypes
    Park, Mira
    Jeong, Hoe-Bin
    Lee, Jong-Hyun
    Park, Taesung
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [24] A faster pedigree-based generalized multifactor dimensionality reduction method for detecting gene-gene interactions
    Chen, Guo-Bo
    Zhu, Jun
    Lou, Xiang-Yang
    STATISTICS AND ITS INTERFACE, 2011, 4 (03) : 295 - 304
  • [25] Case only design to measure gene-gene interaction
    Yang, QH
    Khoury, MJ
    Sun, FZ
    Flanders, WD
    EPIDEMIOLOGY, 1999, 10 (02) : 167 - 170
  • [26] A genetic ensemble approach for gene-gene interaction identification
    Yang, Pengyi
    Ho, Joshua W. K.
    Zomaya, Albert Y.
    Zhou, Bing B.
    BMC BIOINFORMATICS, 2010, 11
  • [27] Power of multifactor dimensionality reduction and penalized logistic regression for detecting gene-gene Interaction in a case-control study
    He, Hua
    Oetting, William S.
    Brott, Marcia J.
    Basu, Saonli
    BMC MEDICAL GENETICS, 2009, 10
  • [28] Gene-gene interaction among cytokine polymorphisms influence susceptibility to aggressive periodontitis
    Scapoli, C.
    Mamolini, E.
    Carrieri, A.
    Guarnelli, M. E.
    Annunziata, M.
    Guida, L.
    Romano, F.
    Aimetti, M.
    Trombelli, L.
    GENES AND IMMUNITY, 2011, 12 (06) : 473 - 480
  • [29] Determining dependency and redundancy for identifying gene-gene interaction associated with complex disease
    Zhou, Xiangdong
    Chan, Keith C. C.
    Huang, Zhihua
    Wang, Jingbin
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2020, 18 (05)
  • [30] GWAS-GMDR: a program package for genome-wide scan of gene-gene interactions with covariate adjustment based on multifactor dimensionality reduction
    Kwon, Min-Seok
    Kim, Kyunga
    Lee, Sungyoung
    Chung, Wonil
    Yi, Sung-Gon
    Namkung, Junghyun
    Park, Taesung
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 703 - 707