Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery

被引:1
作者
Dudzic, Pawel [1 ]
Chomicz, Dawid [1 ]
Konczak, Jaroslaw [1 ]
Satlawa, Tadeusz [1 ]
Janusz, Bartosz [1 ]
Wrobel, Sonia [1 ]
Gawlowski, Tomasz [1 ]
Jaszczyszyn, Igor [1 ]
Bielska, Weronika [1 ]
Demharter, Samuel [2 ]
Spreafico, Roberto [3 ]
Schulte, Lukas [4 ]
Martin, Kyle [5 ]
Comeau, Stephen R. [5 ]
Krawczyk, Konrad [1 ]
机构
[1] NaturalAntibody, Szczecin, Poland
[2] Genmab, Discovery Data Sci, Copenhagen, Denmark
[3] Genmab, Discovery Data Sci, Utrecht, Netherlands
[4] Boehringer Ingelheim Pharma GmbH & Co KG, Global Computat Biol & Digital Sci, Biberach, Germany
[5] Boehringer Ingelheim GmbH & Co KG, Biotherapeut Discovery, Ridgefield, CT USA
关键词
CDR-H3; database; repertoire; REPERTOIRE; CELL; SELECTION;
D O I
10.1080/19420862.2024.2361928
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
The na & iuml;ve human antibody repertoire has theoretical access to an estimated > 10(15) antibodies. Identifying subsets of this prohibitively large space where therapeutically relevant antibodies may be found is useful for development of these agents. It was previously demonstrated that, despite the immense sequence space, different individuals can produce the same antibodies. It was also shown that therapeutic antibodies, which typically follow seemingly unnatural development processes, can arise independently naturally. To check for biases in how the sequence space is explored, we data mined public repositories to identify 220 bioprojects with a combined seven billion reads. Of these, we created a subset of human bioprojects that we make available as the AbNGS database (https://naturalantibody.com/ngs/). AbNGS contains 135 bioprojects with four billion productive human heavy variable region sequences and 385 million unique complementarity-determining region (CDR)-H3s. We find that 270,000 (0.07% of 385 million) unique CDR-H3s are highly public in that they occur in at least five of 135 bioprojects. Of 700 unique therapeutic CDR-H3, a total of 6% has direct matches in the small set of 270,000. This observation extends to a match between CDR-H3 and V-gene call as well. Thus, the subspace of shared ('public') CDR-H3s shows utility for serving as a starting point for therapeutic antibody design.
引用
收藏
页数:12
相关论文
共 56 条
  • [1] In silico proof of principle of machine learning-based antibody design at unconstrained scale
    Akbar, Rahmad
    Robert, Philippe A.
    Weber, Cedric R.
    Widrich, Michael
    Frank, Robert
    Pavlovic, Milena
    Scheffer, Lonneke
    Chernigovskaya, Maria
    Snapkov, Igor
    Slabodkin, Andrei
    Mehta, Brij Bhushan
    Miho, Enkelejda
    Lund-Johansen, Fridtjof
    Andersen, Jan Terje
    Hochreiter, Sepp
    Haff, Ingrid Hobaek
    Klambauer, Guenter
    Sandve, Geir Kjetil
    Greiff, Victor
    [J]. MABS, 2022, 14 (01)
  • [2] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [3] Bashour H, 2023, bioRxiv, DOI [10.1101/2023.10.26.563958, 10.1101/2023.10.26.563958, DOI 10.1101/2023.10.26.563958]
  • [4] High-Throughput DNA Sequencing Analysis of Antibody Repertoires
    Boyd, Scott D.
    Joshi, Shilpa A.
    [J]. MICROBIOLOGY SPECTRUM, 2014, 2 (05):
  • [5] Commonality despite exceptional diversity in the baseline human antibody repertoire
    Briney, Bryan
    Inderbitzin, Anne
    Joyce, Collin
    Burton, Dennis R.
    [J]. NATURE, 2019, 566 (7744) : 393 - +
  • [6] Frequency and genetic characterization of V(DD)J recombinants in the human peripheral blood antibody repertoire
    Briney, Bryan S.
    Willis, Jordan R.
    Hicar, Mark D.
    Thomas, James W., II
    Crowe, James E., Jr.
    [J]. IMMUNOLOGY, 2012, 137 (01) : 56 - 64
  • [7] Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications
    Chomicz, Dawid
    Konczak, Jaroslaw
    Wrobel, Sonia
    Satlawa, Tadeusz
    Dudzic, Pawel
    Janusz, Bartosz
    Tarkowski, Mateusz
    Deszynski, Piotr
    Gawlowski, Tomasz
    Kostyn, Anna
    Orlowski, Marek
    Klaus, Tomasz
    Schulte, Lukas
    Martin, Kyle
    Comeau, Stephen R.
    Krawczyk, Konrad
    [J]. FRONTIERS IN MOLECULAR BIOSCIENCES, 2024, 11
  • [8] The ADC API: A Web API for the Programmatic Query of the AIRR Data Commons
    Christley, Scott
    Aguiar, Ademar
    Blanck, George
    Breden, Felix
    Bukhari, Syed Ahmad Chan
    Busse, Christian E.
    Jaglale, Jerome
    Harikrishnan, Srilakshmy L.
    Laserson, Uri
    Peters, Bjoern
    Rocha, Artur
    Schramm, Chaim A.
    Taylor, Sarah
    Vander Heiden, Jason Anthony
    Zimonja, Bojan
    Watson, Corey T.
    Corrie, Brian
    Cowell, Lindsay G.
    [J]. FRONTIERS IN BIG DATA, 2020, 3
  • [9] Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity
    Corcoran, Martin M.
    Phad, Ganesh E.
    Bernat, Nestor Vazquez
    Stahl-Hennig, Christiane
    Sumida, Noriyuki
    Persson, Mats A. A.
    Martin, Marcel
    Hedestam, Gunilla B. Karlsson
    [J]. NATURE COMMUNICATIONS, 2016, 7
  • [10] iReceptor: A platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories
    Corrie, Brian D.
    Marthandan, Nishanth
    Zimonja, Bojan
    Jaglale, Jerome
    Zhou, Yang
    Barr, Emily
    Knoetze, Nicole
    Breden, Frances M. W.
    Christley, Scott
    Scott, Jamie K.
    Cowell, Lindsay G.
    Breden, Felix
    [J]. IMMUNOLOGICAL REVIEWS, 2018, 284 (01) : 24 - 41