pyGenClean: efficient tool for genetic data clean up before association testing

被引:14
作者
Perreault, Louis-Philippe Lemieux [1 ,2 ]
Provost, Sylvie [1 ]
Legault, Marc-Andre [2 ]
Barhdadi, Amina [1 ]
Dube, Marie-Pierre [1 ,2 ]
机构
[1] Beaulieu Saucier Univ Montreal, Montreal Heart Inst, Res Ctr, Pharmacogen Ctr, Montreal, PQ, Canada
[2] Univ Montreal, Fac Med, Montreal, PQ, Canada
关键词
QUALITY-CONTROL;
D O I
10.1093/bioinformatics/btt261
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genetic association studies making use of high-throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality control to remove poor quality genotypes and generate metrics to inform and select individuals for downstream statistical analysis. We have developed pyGenClean, a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, accelerates the completion of the data clean up process and provides informative plots and metrics to guide decision making for statistical analysis.
引用
收藏
页码:1704 / 1705
页数:2
相关论文
共 5 条
[1]   Data quality control in genetic case-control association studies [J].
Anderson, Carl A. ;
Pettersson, Fredrik H. ;
Clarke, Geraldine M. ;
Cardon, Lon R. ;
Morris, Andrew P. ;
Zondervan, Krina T. .
NATURE PROTOCOLS, 2010, 5 (09) :1564-1573
[2]   Quality Control and Quality Assurance in Genotypic Data for Genome-Wide Association Studies [J].
Laurie, Cathy C. ;
Doheny, Kimberly F. ;
Mirel, Daniel B. ;
Pugh, Elizabeth W. ;
Bierut, Laura J. ;
Bhangale, Tushar ;
Boehm, Frederick ;
Caporaso, Neil E. ;
Cornelis, Marilyn C. ;
Edenberg, Howard J. ;
Gabriel, Stacy B. ;
Harris, Emily L. ;
Hu, Frank B. ;
Jacobs, Kevin B. ;
Kraft, Peter ;
Landi, Maria Teresa ;
Lumley, Thomas ;
Manolio, Teri A. ;
McHugh, Caitlin ;
Painter, Ian ;
Paschall, Justin ;
Rice, John P. ;
Rice, Kenneth M. ;
Zheng, Xiuwen ;
Weir, Bruce S. .
GENETIC EPIDEMIOLOGY, 2010, 34 (06) :591-602
[3]   PLINK: A tool set for whole-genome association and population-based linkage analyses [J].
Purcell, Shaun ;
Neale, Benjamin ;
Todd-Brown, Kathe ;
Thomas, Lori ;
Ferreira, Manuel A. R. ;
Bender, David ;
Maller, Julian ;
Sklar, Pamela ;
de Bakker, Paul I. W. ;
Daly, Mark J. ;
Sham, Pak C. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (03) :559-575
[4]   Inference of Relationships in Population Data Using Identity-by-Descent and Identity-by-State [J].
Stevens, Eric L. ;
Heckenberg, Greg ;
Roberson, Elisha D. O. ;
Baugher, Joseph D. ;
Downey, Thomas J. ;
Pevsner, Jonathan .
PLOS GENETICS, 2011, 7 (09)
[5]  
Turner Stephen, 2011, Curr Protoc Hum Genet, VChapter 1, DOI 10.1002/0471142905.hg0119s68