Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA

被引:8
作者
Xiao, Sihao [1 ,2 ,11 ]
Kai, Zhentian [3 ]
Murphy, Daniel [2 ,4 ]
Li, Dongyang [1 ,2 ]
Patel, Dilip [1 ,2 ]
Bielowka, Adrianna M. [1 ,2 ]
Bernabeu-Herrero, Maria E. [1 ,2 ]
Abdulmogith, Awatif [1 ,2 ]
Mumford, Andrew D. [5 ]
Westbury, Sarah K. [5 ]
Aldred, Micheala A. [6 ]
Vargesson, Neil [7 ]
Caulfield, Mark J. [8 ]
Shovlin, Claire L. [1 ,2 ,10 ]
机构
[1] Natl Heart & Lung Inst, Imperial Coll London, London W12 ONN, England
[2] Natl Inst Hlth Res NIHR, Imperial Biomed Res Ctr, London W2 1NY, England
[3] Topgen Biopharm Technol Co Ltd, Shanghai 201203, Peoples R China
[4] Imperial Coll Healthcare NHS Trust, Womens Childrens & Clin Support Pharm, London W2 1NY, England
[5] Univ Bristol, Sch Cellular & Mol Med, Bristol BS8 1QU, England
[6] Indiana Univ Sch Med, Div Pulm Crit Care Sleep & Occupat Med, Indianapolis, IN 46202 USA
[7] Univ Aberdeen, Sch Med Med Sci & Nutr, Aberdeen AB25 2ZD, Scotland
[8] Queen Mary Univ London, William Harvey Res Inst, London E1 4NS, England
[9] Genom England, London EC1M 6BQ, England
[10] Imperial Coll Healthcare NHS Trust, Med, London W12 OHS, England
[11] Univ Oxford, Big Data Inst, Oxford, England
基金
美国国家卫生研究院;
关键词
ENCODE DATA; EXPRESSION; RNA; COEFFICIENT; DIAGNOSIS; BROWSER;
D O I
10.1016/j.ajhg.2023.09.005
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY, an analytic tool that integrates coordinates for regions with experimental evidence of functionality. Applied to WGS data from solved and unsolved hereditary hemorrhagic telangiectasia (HHT) recruits to the 100,000 Genomes Project, GROFFFY-based filtration reduced the mean number of variants/DNA from 4,867,167 to 21,486, without deleting disease-causal variants. In three unsolved cases (two related), GROFFFY identified ultra-rare deletions within the 3' untranslated region (UTR) of the tumor suppressor SMAD4, where germ -line loss-of-function alleles cause combined HHT and colonic polyposis (MIM: 175050). Sited >5.4 kb distal to coding DNA, the deletions did not modify or generate microRNA binding sites, but instead disrupted the sequence context of the final cleavage and polyadenyla-tion site necessary for protein production: By iFoldRNA, an AAUAAA-adjacent 16-nucleotide deletion brought the cleavage site into inac-cessible neighboring secondary structures, while a 4-nucleotide deletion unfolded the downstream RNA polymerase II roadblock. SMAD4 RNA expression differed to control-derived RNA from resting and cycloheximide-stressed peripheral blood mononuclear cells. Patterns predicted the mutational site for an unrelated HHT/polyposis-affected individual, where a complex insertion was subsequently identified. In conclusion, we describe a functional rare variant type that impacts regulatory systems based on RNA polyadenylation. Extension of coding sequence-focused gene panels is required to capture these variants.
引用
收藏
页码:1903 / 1918
页数:17
相关论文
共 69 条
  • [1] The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update
    Afgan, Enis
    Baker, Dannon
    Batut, Berenice
    van den Beek, Marius
    Bouvier, Dave
    Cech, Martin
    Chilton, John
    Clements, Dave
    Coraor, Nate
    Gruening, Bjoern A.
    Guerler, Aysam
    Hillman-Jackson, Jennifer
    Hiltemann, Saskia
    Jalili, Vahid
    Rasche, Helena
    Soranzo, Nicola
    Goecks, Jeremy
    Taylor, James
    Nekrutenko, Anton
    Blankenberg, Daniel
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) : W537 - W544
  • [2] Differential expression analysis for sequence count data
    Anders, Simon
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2010, 11 (10):
  • [3] Detecting differential usage of exons from RNA-seq data
    Anders, Simon
    Reyes, Alejandro
    Huber, Wolfgang
    [J]. GENOME RESEARCH, 2012, 22 (10) : 2008 - 2017
  • [4] Anderson EL, 2008, P NATL ACAD SCI USA, V105, P14976, DOI [10.1073/pnas.0807297105, 10.1038/s41598-019-45839-z]
  • [5] Identification and validation of a novel pathogenic variant in GDF2 (BMP9) responsible for hereditary hemorrhagic telangiectasia and pulmonary arteriovenous malformations
    Balachandar, Srimmitha
    Graves, Tamara J.
    Shimonty, Anika
    Kerr, Katie
    Kilner, Jill
    Xiao, Sihao
    Slade, Richard
    Sroya, Manveer
    Alikian, Mary
    Curetean, Emanuel
    Thomas, Ellen
    McConnell, Vivienne P. M.
    McKee, Shane
    Boardman-Pretty, Freya
    Devereau, Andrew
    Fowler, Tom A.
    Caulfield, Mark J.
    Alton, Eric W.
    Ferguson, Teena
    Redhead, Julian
    McKnight, Amy J.
    Thomas, Geraldine A.
    Aldred, Micheala A.
    Shovlin, Claire L.
    [J]. AMERICAN JOURNAL OF MEDICAL GENETICS PART A, 2022, 188 (03) : 959 - 964
  • [6] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [7] Bernabeu-Herrero ME, 2023, bioRxiv, DOI [10.1101/2021.12.05.471269, 10.1101/2021.12.05.471269, DOI 10.1101/2021.12.05.471269V2]
  • [8] The NIH Roadmap Epigenomics Mapping Consortium
    Bernstein, Bradley E.
    Stamatoyannopoulos, John A.
    Costello, Joseph F.
    Ren, Bing
    Milosavljevic, Aleksandar
    Meissner, Alexander
    Kellis, Manolis
    Marra, Marco A.
    Beaudet, Arthur L.
    Ecker, Joseph R.
    Farnham, Peggy J.
    Hirst, Martin
    Lander, Eric S.
    Mikkelsen, Tarjei S.
    Thomson, James A.
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (10) : 1045 - 1048
  • [9] Trimmomatic: a flexible trimmer for Illumina sequence data
    Bolger, Anthony M.
    Lohse, Marc
    Usadel, Bjoern
    [J]. BIOINFORMATICS, 2014, 30 (15) : 2114 - 2120
  • [10] miRDB: an online database for prediction of functional microRNA targets
    Chen, Yuhao
    Wang, Xiaowei
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D127 - D131