Gene-gene interaction: the curse of dimensionality

被引:31
|
作者
Chattopadhyay, Amrita [1 ]
Lu, Tzu-Pin [1 ]
机构
[1] Natl Taiwan Univ, Inst Epidemiol & Prevent Med, Dept Publ Hlth, Taipei, Taiwan
关键词
Gene-gene interaction; parallel computing; PySpark; deep-learning (DL); machine-learning (ML); multifactor dimensionality reduction (MDR); MISSING HERITABILITY; REDUCTION METHOD; NEURAL-NETWORKS; EPISTASIS;
D O I
10.21037/atm.2019.12.87
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the "missing heritability" problem. An avenue, to account for a part of this "missingness" is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the "curse of dimensionality". The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one-dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this "curse". PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] New evaluation measures for multifactor dimensionality reduction classifiers in gene-gene interaction analysis
    Namkung, Junghyun
    Kim, Kyunga
    Yi, Sungon
    Chung, Wonil
    Kwon, Min-Seok
    Park, Taesung
    BIOINFORMATICS, 2009, 25 (03) : 338 - 345
  • [2] Multifactor dimensionality reduction analysis of multiple binary traits for gene-gene interaction
    Huh, Iksoo
    Park, Taesung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (04) : 293 - 304
  • [3] Omnibus testing approach for gene-based gene-gene interaction
    Hebert, Florian
    Causeur, David
    Emily, Mathieu
    STATISTICS IN MEDICINE, 2022, 41 (15) : 2854 - 2878
  • [4] Multivariate Quantitative Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions
    Yu, Wenbao
    Kwon, Min-Seok
    Park, Taesung
    HUMAN HEREDITY, 2015, 79 (3-4) : 168 - 181
  • [5] Testing for Gene-Gene Interaction with AMMI Models
    Barhdadi, Amina
    Dube, Marie-Pierre
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2010, 9 (01)
  • [6] Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype
    Lee, Seungyeoun
    Son, Donghee
    Kim, Yongkang
    Yu, Wenbao
    Park, Taesung
    BIODATA MINING, 2018, 11
  • [7] Incorporating Domain Knowledge into Evolutionary Computing for Discovering Gene-Gene Interaction
    Turner, Stephen D.
    Dudek, Scott M.
    Ritchie, Marylyn D.
    PARALLEL PROBLEMS SOLVING FROM NATURE - PPSN XI, PT I, 2010, 6238 : 394 - 403
  • [8] A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction
    Jung, Hye-Young
    Leem, Sangseob
    Lee, Sungyoung
    Park, Taesung
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2016, 65 : 193 - 202
  • [9] A systematic analysis of gene-gene interaction in multiple sclerosis
    Slim, Lotfi
    Chatelain, Clement
    de Foucauld, Helene
    Azencott, Chloe-Agathe
    BMC MEDICAL GENOMICS, 2022, 15 (01)
  • [10] Class Balanced Multifactor Dimensionality Reduction to Detect Gene-Gene Interactions
    Yang, Cheng-Hong
    Lin, Yu-Da
    Chuang, Li-Yeh
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (01) : 71 - 81