MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis

被引:33
作者
Brunel, Helena [1 ,2 ,3 ]
Gallardo-Chacon, Joan-Josep [2 ,3 ]
Buil, Alfonso [4 ]
Vallverdu, Montserrat [2 ,3 ]
Manuel Soria, Jose [4 ]
Caminal, Pere [2 ,3 ]
Perera, Alexandre [2 ,3 ]
机构
[1] Univ Politecn Cataluna, Inst Bioengn Catalunya, E-08028 Barcelona, Spain
[2] Univ Politecn Cataluna, Dept Engn Sistemes Automat & Informat Ind, E-08028 Barcelona, Spain
[3] Hosp Santa Creu & Sant Pau, CIBER Bioingn Biomat & Biomed, Barcelona 08025, Spain
[4] Hosp Santa Creu & Sant Pau, Inst Recerca, Unitat Genom Malalties Complexes, Barcelona 08025, Spain
关键词
FACTOR-VII GENE; COMPLEX DISEASE; F7; GENE; SELECTION; POLYMORPHISM; ALGORITHM; RISK;
D O I
10.1093/bioinformatics/btq273
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Finding association between genetic variants and phenotypes related to disease has become an important vehicle for the study of complex disorders. In this context, multi-loci genetic association might unravel additional information when compared with single loci search. The main goal of this work is to propose a non-linear methodology based on information theory for finding combinatorial association between multi-SNPs and a given phenotype. Results: The proposed methodology, called MISS (mutual information statistical significance), has been integrated jointly with a feature selection algorithm and has been tested on a synthetic dataset with a controlled phenotype and in the particular case of the F7 gene. The MISS methodology has been contrasted with a multiple linear regression (MLR) method used for genetic association in both, a population-based study and a sib-pairs analysis and with the maximum entropy conditional probability modelling (MECPM) method, which searches for predictive multi-locus interactions. Several sets of SNPs within the F7 gene region have been found to show a significant correlation with the FVII levels in blood. The proposed multi-site approach unveils combinations of SNPs that explain more significant information of the phenotype than their individual polymorphisms. MISS is able to find more correlations between SNPs and the phenotype than MLR and MECPM. Most of the marked SNPs appear in the literature as functional variants with real effect on the protein FVII levels in blood.
引用
收藏
页码:1811 / 1818
页数:8
相关论文
共 48 条
  • [1] [Anonymous], 1998, GENETIC ANAL QUANTIT
  • [2] [Anonymous], P IEEE INT C DAT MIN
  • [3] [Anonymous], 2005, R LANG ENV STAT COMP
  • [4] BISHOP DT, 1990, AM J HUM GENET, V46, P254
  • [5] BRINZA D, 2006, ANN INT C IEEE ENG M, V1, P5802
  • [6] The essence of SNPs
    Brookes, AJ
    [J]. GENE, 1999, 234 (02) : 177 - 186
  • [7] Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium
    Carlson, CS
    Eberle, MA
    Rieder, MJ
    Yi, Q
    Kruglyak, L
    Nickerson, DA
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (01) : 106 - 120
  • [8] Cheng HB, 2007, LECT NOTES ARTIF INT, V4571, P144
  • [9] Gene mapping and marker clustering using Shannon's mutual information
    Dawy, Z
    Goebel, B
    Hagenauer, J
    Andreoli, C
    Meitinger, T
    Mueller, JC
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (01) : 47 - 56
  • [10] Epidemiology of factor V Leiden: Clinical implications
    De Stefano, V
    Chiusolo, P
    Paciaroni, K
    Leone, G
    [J]. SEMINARS IN THROMBOSIS AND HEMOSTASIS, 1998, 24 (04) : 367 - 379