Gene- or region-based association study via kernel principal component analysis

被引:12
作者
Gao, Qingsong [1 ]
He, Yungang [2 ,3 ]
Yuan, Zhongshang [1 ]
Zhao, Jinghua [4 ]
Zhang, Bingbing [1 ]
Xue, Fuzhong [1 ]
机构
[1] Shandong Univ, Dept Epidemiol & Hlth Stat, Sch Publ Hlth, Jinan 250012, Peoples R China
[2] Chinese Acad Sci, CAS MPG Partner Inst Computat Biol, Shanghai Inst Biol Sci, Shanghai 200031, Peoples R China
[3] Chinese Acad Sci, CAS MPG Partner Inst Computat Biol, Key Lab Computat Biol, Shanghai 200031, Peoples R China
[4] Addenbrookes Hosp, Inst Metab Sci, MRC Epidemiol Unit, Cambridge, England
基金
中国国家自然科学基金;
关键词
GENOME-WIDE ASSOCIATION; RHEUMATOID-ARTHRITIS; ULCERATIVE-COLITIS; EXPRESSION DATA; RISK LOCI; PTPN22; POLYMORPHISM; METAANALYSIS; IMPUTATION; DISEASES;
D O I
10.1186/1471-2156-12-75
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: In genetic association study, especially in GWAS, gene -or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity. Results: Simulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT. Conclusions: KPCA-LRT is a valid and powerful gene-or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.
引用
收藏
页数:8
相关论文
共 51 条
[1]  
[Anonymous], 1985, ENCYLOPEDIA STAT SCI
[2]  
[Anonymous], 2009, BMC P, DOI DOI 10.1186/1753-6561-3-S7-S135
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   A missense single-nucleotide polymorphism in a gene encoding a protein tyrosine phosphatase (PTPN22) is associated with rheumatoid arthritis [J].
Begovich, AB ;
Carlton, VEH ;
Honigberg, LA ;
Schrodi, SJ ;
Chokkalingam, AP ;
Alexander, HC ;
Ardlie, KG ;
Huang, QQ ;
Smith, AM ;
Spoerke, JM ;
Conn, MT ;
Chang, M ;
Chang, SYP ;
Saiki, RK ;
Catanese, JJ ;
Leong, DU ;
Garcia, VE ;
McAllister, LB ;
Jeffery, DA ;
Lee, AT ;
Batliwalla, F ;
Remmers, E ;
Criswell, LA ;
Seldin, MF ;
Kastner, DL ;
Amos, CI ;
Sninsky, JJ ;
Gregersen, PK .
AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 75 (02) :330-337
[5]   Gene- or Region-Based Analysis of Genome-Wide Association Studies [J].
Beyene, Joseph ;
Tritchler, David ;
Asimit, Jennifer L. ;
Hamid, Jemila S. .
GENETIC EPIDEMIOLOGY, 2009, 33 :S105-S110
[6]  
Boyer F, 2001, ANN RHEUM DIS, V60, P901
[7]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[8]  
BUIL A, 2009, NEW GENE BASED ASS T, pS130
[9]   PTPN22 genetic variation:: Evidence for multiple variants associated with rheumatoid arthritis [J].
Carlton, VEH ;
Hu, XL ;
Chokkalingam, AP ;
Schrodi, SJ ;
Brandon, R ;
Alexander, HC ;
Chang, M ;
Catanese, JJ ;
Leong, DU ;
Ardlie, KG ;
Kastner, DL ;
Seldin, MF ;
Criswell, LA ;
Gregersen, PK ;
Beasley, E ;
Thomson, G ;
Amos, CI ;
Begovich, AB .
AMERICAN JOURNAL OF HUMAN GENETICS, 2005, 77 (04) :567-581
[10]   Practical aspects of imputation-driven meta-analysis of genome-wide association studies [J].
de Bakker, Paul I. W. ;
Ferreira, Manuel A. R. ;
Jia, Xiaoming ;
Neale, Benjamin M. ;
Raychaudhuri, Soumya ;
Voight, Benjamin F. .
HUMAN MOLECULAR GENETICS, 2008, 17 :R122-R128