Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches

被引:48
作者
Pedersen, Brent S. [1 ,2 ]
Bhetariya, Preetida J. [1 ]
Brown, Joe [1 ,2 ]
Kravitz, Stephanie N. [1 ]
Marth, Gabor [1 ]
Jensen, Randy L. [3 ]
Bronner, Mary P. [4 ]
Underhill, Hunter R. [5 ]
Quinlan, Aaron R. [1 ,2 ,6 ]
机构
[1] Univ Utah, Dept Human Genet, 15 S 2030 E, Salt Lake City, UT 84112 USA
[2] Base2 Genom LLC, Salt Lake City, UT 84105 USA
[3] Univ Utah, Huntsman Canc Inst, Dept Neurosurg Radiat Oncol & Ontol Sci, 5th Floor CNC,175 North Med Dr, Salt Lake City, UT 84132 USA
[4] Univ Utah, Huntsman Canc Inst, Dept Pathol, 15 S 2030 E, Salt Lake City, UT 84112 USA
[5] Univ Utah, Dept Pediat, Dept Radiol, Div Med Genet, 15 S 2030 E, Salt Lake City, UT 84112 USA
[6] Univ Utah, Dept Biomed Informat, 421 Wakara Way 140, Salt Lake City, UT 84108 USA
关键词
D O I
10.1186/s13073-020-00761-2
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics. Results: We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project. Conclusions: Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at.
引用
收藏
页数:9
相关论文
共 15 条
[1]  
Bergmann Ewa A, 2016, Bioinformatics, V32, P3196, DOI 10.1093/bioinformatics/btw389
[2]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[3]   Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples [J].
Cibulskis, Kristian ;
Lawrence, Michael S. ;
Carter, Scott L. ;
Sivachenko, Andrey ;
Jaffe, David ;
Sougnez, Carrie ;
Gabriel, Stacey ;
Meyerson, Matthew ;
Lander, Eric S. ;
Getz, Gad .
NATURE BIOTECHNOLOGY, 2013, 31 (03) :213-219
[4]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[5]   Analysis of protein-coding genetic variation in 60,706 humans [J].
Lek, Monkol ;
Karczewski, Konrad J. ;
Minikel, Eric V. ;
Samocha, Kaitlin E. ;
Banks, Eric ;
Fennell, Timothy ;
O'Donnell-Luria, Anne H. ;
Ware, James S. ;
Hill, Andrew J. ;
Cummings, Beryl B. ;
Tukiainen, Taru ;
Birnbaum, Daniel P. ;
Kosmicki, Jack A. ;
Duncan, Laramie E. ;
Estrada, Karol ;
Zhao, Fengmei ;
Zou, James ;
Pierce-Hollman, Emma ;
Berghout, Joanne ;
Cooper, David N. ;
Deflaux, Nicole ;
DePristo, Mark ;
Do, Ron ;
Flannick, Jason ;
Fromer, Menachem ;
Gauthier, Laura ;
Goldstein, Jackie ;
Gupta, Namrata ;
Howrigan, Daniel ;
Kiezun, Adam ;
Kurki, Mitja I. ;
Moonshine, Ami Levy ;
Natarajan, Pradeep ;
Orozeo, Lorena ;
Peloso, Gina M. ;
Poplin, Ryan ;
Rivas, Manuel A. ;
Ruano-Rubio, Valentin ;
Rose, Samuel A. ;
Ruderfer, Douglas M. ;
Shakir, Khalid ;
Stenson, Peter D. ;
Stevens, Christine ;
Thomas, Brett P. ;
Tiao, Grace ;
Tusie-Luna, Maria T. ;
Weisburd, Ben ;
Won, Hong-Hee ;
Yu, Dongmei ;
Altshuler, David M. .
NATURE, 2016, 536 (7616) :285-+
[6]   Toward better understanding of artifacts in variant calling from high-coverage samples [J].
Li, Heng .
BIOINFORMATICS, 2014, 30 (20) :2843-2851
[7]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[8]   The Genotype-Tissue Expression (GTEx) project [J].
Lonsdale, John ;
Thomas, Jeffrey ;
Salvatore, Mike ;
Phillips, Rebecca ;
Lo, Edmund ;
Shad, Saboor ;
Hasz, Richard ;
Walters, Gary ;
Garcia, Fernando ;
Young, Nancy ;
Foster, Barbara ;
Moser, Mike ;
Karasik, Ellen ;
Gillard, Bryan ;
Ramsey, Kimberley ;
Sullivan, Susan ;
Bridge, Jason ;
Magazine, Harold ;
Syron, John ;
Fleming, Johnelle ;
Siminoff, Laura ;
Traino, Heather ;
Mosavel, Maghboeba ;
Barker, Laura ;
Jewell, Scott ;
Rohrer, Dan ;
Maxim, Dan ;
Filkins, Dana ;
Harbach, Philip ;
Cortadillo, Eddie ;
Berghuis, Bree ;
Turner, Lisa ;
Hudson, Eric ;
Feenstra, Kristin ;
Sobin, Leslie ;
Robb, James ;
Branton, Phillip ;
Korzeniewski, Greg ;
Shive, Charles ;
Tabor, David ;
Qi, Liqun ;
Groch, Kevin ;
Nampally, Sreenath ;
Buia, Steve ;
Zimmerman, Angela ;
Smith, Anna ;
Burges, Robin ;
Robinson, Karna ;
Valentino, Kim ;
Bradbury, Deborah .
NATURE GENETICS, 2013, 45 (06) :580-585
[9]   The Simons Genome Diversity Project: 300 genomes from 142 diverse populations [J].
Mallick, Swapan ;
Li, Heng ;
Lipson, Mark ;
Mathieson, Iain ;
Gymrek, Melissa ;
Racimo, Fernando ;
Zhao, Mengyao ;
Chennagiri, Niru ;
Nordenfelt, Susanne ;
Tandon, Arti ;
Skoglund, Pontus ;
Lazaridis, Iosif ;
Sankararaman, Sriram ;
Fu, Qiaomei ;
Rohland, Nadin ;
Renaud, Gabriel ;
Erlich, Yaniv ;
Willems, Thomas ;
Gallo, Carla ;
Spence, Jeffrey P. ;
Song, Yun S. ;
Poletti, Giovanni ;
Balloux, Francois ;
van Driem, George ;
de Knijff, Peter ;
Romero, Irene Gallego ;
Jha, Aashish R. ;
Behar, Doron M. ;
Bravi, Claudio M. ;
Capelli, Cristian ;
Hervig, Tor ;
Moreno-Estrada, Andres ;
Posukh, Olga L. ;
Balanovska, Elena ;
Balanovsky, Oleg ;
Karachanak-Yankova, Sena ;
Sahakyan, Hovhannes ;
Toncheva, Draga ;
Yepiskoposyan, Levon ;
Tyler-Smith, Chris ;
Xue, Yali ;
Abdullah, M. Syafiq ;
Ruiz-Linares, Andres ;
Beall, Cynthia M. ;
Di Rienzo, Anna ;
Jeong, Choongwon ;
Starikovskaya, Elena B. ;
Metspalu, Ene ;
Parik, Juri ;
Villems, Richard .
NATURE, 2016, 538 (7624) :201-+
[10]   Robust relationship inference in genome-wide association studies [J].
Manichaikul, Ani ;
Mychaleckyj, Josyf C. ;
Rich, Stephen S. ;
Daly, Kathy ;
Sale, Michele ;
Chen, Wei-Min .
BIOINFORMATICS, 2010, 26 (22) :2867-2873