High-throughput deep learning variant effect prediction with Sequence UNET

被引:13
作者
Dunham, Alistair S. [1 ,2 ]
Beltrao, Pedro [1 ,3 ]
AlQuraishi, Mohammed [4 ]
机构
[1] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Genome Campus, Hinxton CB10 1SD, Cambs, England
[2] Wellcome Sanger Inst, Wellcome Genome Campus, Hinxton CB10 1RQ, Cambs, England
[3] Swiss Fed Inst Technol, Inst Mol Syst Biol, Dept Biol, CH-8093 Zurich, Switzerland
[4] Columbia Univ, Dept Syst Biol, New York, NY 10027 USA
基金
英国惠康基金;
关键词
Variant effect prediction; Deep learning; Mutation; PSSM; Pathogenicity; Machine learning; SERVER;
D O I
10.1186/s13059-023-02948-3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture. It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Machine learning models in error and variant detection in high-variation high-throughput sequencing datasets
    Krachunov, Milko
    Nisheva, Maria
    Vassilev, Dimitar
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 1145 - 1154
  • [22] High-throughput measurement method for rice seedling based on improved UNet model
    Liu, Sicheng
    Huang, Ze
    Xu, Zhihui
    Zhao, Fujun
    Xiong, Dongliang
    Peng, Shaobing
    Huang, Jianliang
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 219
  • [23] High-throughput classification of S. cerevisiae tetrads using deep learning
    Szucs, Balint
    Selvan, Raghavendra
    Lisby, Michael
    YEAST, 2024, 41 (07) : 423 - 436
  • [24] PathFlowAI: A High-Throughput Workflow for Preprocessing, Deep Learning and Interpretation in Digital Pathology
    Levy, Joshua J.
    Salas, Lucas A.
    Christensen, Brock C.
    Sriharan, Aravindhan
    Vaickus, Louis J.
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, 2020, : 403 - 414
  • [25] Protein function prediction with high-throughput data
    Zhao, Xing-Ming
    Chen, Luonan
    Aihara, Kazuyuki
    AMINO ACIDS, 2008, 35 (03) : 517 - 530
  • [26] Protein function prediction with high-throughput data
    Xing-Ming Zhao
    Luonan Chen
    Kazuyuki Aihara
    Amino Acids, 2008, 35
  • [27] High-throughput prediction of stress–strain curves of thermoplastic elastomer model block copolymers by combining hierarchical simulation and deep learning
    Takeshi Aoyagi
    MRS Advances, 2021, 6 : 32 - 36
  • [28] High-Throughput Classification of Radiographs Using Deep Convolutional Neural Networks
    Rajkomar, Alvin
    Lingam, Sneha
    Taylor, Andrew G.
    Blum, Michael
    Mongan, John
    JOURNAL OF DIGITAL IMAGING, 2017, 30 (01) : 95 - 101
  • [29] DeepTetrad: high-throughput image analysis of meiotic tetrads by deep learning in Arabidopsis thaliana
    Lim, Eun-Cheon
    Kim, Jaeil
    Park, Jihye
    Kim, Eun-Jung
    Kim, Juhyun
    Park, Yeong Mi
    Cho, Hyun Seob
    Byun, Dohwan
    Henderson, Ian R.
    Copenhaver, Gregory P.
    Hwang, Ildoo
    Choi, Kyuha
    PLANT JOURNAL, 2020, 101 (02) : 473 - 483
  • [30] Deep Learning on High-Throughput Transcriptomics to Predict Drug-Induced Liver Injury
    Li, Ting
    Tong, Weida
    Roberts, Ruth
    Liu, Zhichao
    Thakkar, Shraddha
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8