Secure discovery of genetic relatives across large-scale and distributed genomic data sets

被引：2

作者：

Hong, Matthew M. ^{[1
]}

Froelicher, David ^{[1
,2
]}

Magner, Ricky ^{[2
]}

Popic, Victoria ^{[2
]}

Berger, Bonnie ^{[1
,2
,3
]}

Cho, Hyunghoon ^{[4
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[2] Broad Inst Massachusetts Inst Technol & Harvard, Cambridge, MA 02142 USA

[3] MIT, Dept Math, Cambridge, MA 02139 USA

[4] Yale Univ, Dept Biomed Informat & Data Sci, New Haven, CT 06510 USA

来源：

GENOME RESEARCH | 2024年 / 34卷 / 09期

基金：

美国国家卫生研究院;

关键词：

CRYPTIC RELATEDNESS; ASSOCIATIONS; INFERENCE; MODEL;

D O I：

10.1101/gr.279057.124

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging owing to the burden of estimating kinship between all the pairs of individuals across data sets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing (LSH) approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us data sets. On a data set of 200,000 individuals split between two parties, SF-Relate detects 97% of third-degree or closer relatives within 15 h of runtime. Our work enables secure identification of relatives across large-scale genomic data sets.

引用

页码：1312 / 1323

页数：12

共 44 条

[1] The "All of Us" Research Program
Denny J.C.
Rutter J.L.
Goldstein D.B.
Philippakis A.
Smoller J.W.
Jenkins G.
Dishman E.
[J]. NEW ENGLAND JOURNAL OF MEDICINE, 2019, 381 (07) : 668 - 676
[2] Data quality control in genetic case-control association studies
Anderson, Carl A.
Pettersson, Fredrik H.
Clarke, Geraldine M.
Cardon, Lon R.
Morris, Andrew P.
Zondervan, Krina T.
[J]. NATURE PROTOCOLS, 2010, 5 (09) : 1564 - 1573
[3] Population Structure and Cryptic Relatedness in Genetic Association Studies
Astle, William
Balding, David J.
[J]. STATISTICAL SCIENCE, 2009, 24 (04) : 451 - 471
[4] 2016, bioRxiv, DOI [10.1101/048181, 10.1101/048181, DOI 10.1101/048181]
[5] Secure large-scale genome-wide association studies using homomorphic encryption
Blatt, Marcelo
Gusev, Alexander
Polyakov, Yuriy
Goldwasser, Shafi
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (21) : 11608 - 11613
[6] On the resemblance and containment of documents
Broder, AZ
[J]. COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, : 21 - 29
[7] The UK Biobank resource with deep phenotyping and genomic data
Bycroft, Clare
Freeman, Colin
Petkova, Desislava
Band, Gavin
Elliott, Lloyd T.
Sharp, Kevin
Motyer, Allan
Vukcevic, Damjan
Delaneau, Olivier
O'Connell, Jared
Cortes, Adrian
Welsh, Samantha
Young, Alan
Effingham, Mark
McVean, Gil
Leslie, Stephen
Allen, Naomi
Donnelly, Peter
Marchini, Jonathan
[J]. NATURE, 2018, 562 (7726) : 203 - +
[8] High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios
Byrska-Bishop, Marta
Evani, Uday S.
Zhao, Xuefang
Basile, Anna O.
Abel, Haley J.
Regier, Allison A.
Corvelo, Andre
Clarke, Wayne E.
Musunuri, Rajeeva
Nagulapalli, Kshithija
Fairley, Susan
Runnels, Alexi
Winterkorn, Lara
Lowy, Ernesto
Flicek, Paul
Germer, Soren
Brand, Harrison
Hall, Ira M.
Talkowski, Michael E.
Narzisi, Giuseppe
Zody, Michael C.
[J]. CELL, 2022, 185 (18) : 3426 - +
[9] Second-generation PLINK: rising to the challenge of larger and richer datasets
Chang, Christopher C.
Chow, Carson C.
Tellier, Laurent C. A. M.
Vattikuti, Shashaank
Purcell, Shaun M.
Lee, James J.
[J]. GIGASCIENCE, 2015, 4
[10] Homomorphic Encryption for Arithmetic of Approximate Numbers
Cheon, Jung Hee
Kim, Andrey
Kim, Miran
Song, Yongsoo
[J]. ADVANCES IN CRYPTOLOGY - ASIACRYPT 2017, PT I, 2017, 10624 : 409 - 437

← 1 2 3 4 5 →