Selection of target-binding proteins from the information of weakly enriched phage display libraries by deep sequencing and machine learning

被引:7
作者
Ito, Tomoyuki [1 ]
Nguyen, Thuy Duong [2 ]
Saito, Yutaka [2 ,3 ,4 ,5 ]
Kurumida, Yoichi [2 ]
Nakazawa, Hikaru [1 ]
Kawada, Sakiya [1 ]
Nishi, Hafumi [6 ,7 ,8 ]
Tsuda, Koji [4 ,5 ,9 ]
Kameda, Tomoshi [2 ,5 ]
Umetsu, Mitsuo [1 ,5 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Dept Biomol Engn, Sendai, Japan
[2] Natl Inst Adv Ind Sci & Technol, Artificial Intelligence Res Ctr, Tokyo, Japan
[3] Waseda Univ, AIST, Computat Bio Big Data Open Innovat Lab CBBD OIL, Tokyo, Japan
[4] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol & Med Sci, Tokyo, Japan
[5] RIKEN, Ctr Adv Intelligence Project, Tokyo, Japan
[6] Tohoku Univ, Grad Sch Informat Sci, Dept Appl Informat Sci, Sendai, Japan
[7] Tohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
[8] Ochanomizu Univ, Fac Core Res, Tokyo, Japan
[9] Natl Inst Mat Sci, Res & Serv Div Mat Data & Integrated Syst, Tsukuba, Japan
基金
日本科学技术振兴机构; 日本学术振兴会;
关键词
Machine learning; antibody mimetics; directed evolution; deep sequencing analysis; phage display; DIAGNOSIS; EVOLUTION; PEPTIDES; AFFINITY;
D O I
10.1080/19420862.2023.2168470
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Despite the advances in surface-display systems for directed evolution, variants with high affinity are not always enriched due to undesirable biases that increase target-unrelated variants during biopanning. Here, our goal was to design a library containing improved variants from the information of the "weakly enriched " library where functional variants were weakly enriched. Deep sequencing for the previous biopanning result, where no functional antibody mimetics were experimentally identified, revealed that weak enrichment was partly due to undesirable biases during phage infection and amplification steps. The clustering analysis of the deep sequencing data from appropriate steps revealed no distinct sequence patterns, but a Bayesian machine learning model trained with the selected deep sequencing data supplied nine clusters with distinct sequence patterns. Phage libraries were designed on the basis of the sequence patterns identified, and four improved variants with target-specific affinity (EC50 = 80-277 nM) were identified by biopanning. The selection and use of deep sequencing data without undesirable bias enabled us to extract the information on prospective variants. In summary, the use of appropriate deep sequencing data and machine learning with the sequence data has the possibility of finding sequence space where functional variants are enriched.
引用
收藏
页数:11
相关论文
共 42 条
  • [1] Unified rational protein engineering with sequence-based deep representation learning
    Alley, Ethan C.
    Khimulya, Grigory
    Biswas, Surojit
    AlQuraishi, Mohammed
    Church, George M.
    [J]. NATURE METHODS, 2019, 16 (12) : 1315 - +
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] UniProt: the universal protein knowledgebase in 2021
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Agivetova, Rahat
    Ahmad, Shadab
    Alpi, Emanuele
    Bowler-Barnett, Emily H.
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Coetzee, Ray
    Cukura, Austra
    Da Silva, Alan
    Denny, Paul
    Dogan, Tunca
    Ebenezer, ThankGod
    Fan, Jun
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Hussein, Abdulrahman
    Ignatchenko, Alexandr
    Insana, Giuseppe
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lock, Antonia
    Lopez, Rodrigo
    Luciani, Aurelien
    Luo, Jie
    Lussi, Yvonne
    Mac-Dougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Mishra, Alok
    Moulang, Katie
    Nightingale, Andrew
    Oliveira, Carla Susana
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Rice, Daniel
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sampson, Joseph
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D480 - D489
  • [4] Machine learning-guided channelrhodopsin engineering enables minimally invasive optogenetics
    Bedbrook, Claire N.
    Yang, Kevin K.
    Robinson, J. Elliott
    Mackey, Elisha D.
    Gradinaru, Viviana
    Arnold, Frances H.
    [J]. NATURE METHODS, 2019, 16 (11) : 1176 - +
  • [5] Low-N protein engineering with data-efficient deep learning
    Biswas, Surojit
    Khimulya, Grigory
    Alley, Ethan C.
    Esvelt, Kevin M.
    Church, George M.
    [J]. NATURE METHODS, 2021, 18 (04) : 389 - +
  • [6] The making of bispecific antibodies
    Brinkmann, Ulrich
    Kontermann, Roland E.
    [J]. MABS, 2017, 9 (02) : 182 - 212
  • [7] A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes
    Cadet, Frederic
    Fontaine, Nicolas
    Li, Guangyue
    Sanchis, Joaquin
    Chong, Matthieu Ng Fuk
    Pandjaitan, Rudy
    Vetrivel, Iyanar
    Offmann, Bernard
    Reetz, Manfred T.
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [8] High-throughput screening of biomolecules using cell-free gene expression systems
    Contreras-Llano, Luis E.
    Tan, Cheemeng
    [J]. SYNTHETIC BIOLOGY, 2018, 3 (01)
  • [9] WebLogo: A sequence logo generator
    Crooks, GE
    Hon, G
    Chandonia, JM
    Brenner, SE
    [J]. GENOME RESEARCH, 2004, 14 (06) : 1188 - 1190
  • [10] Galectin-3 as a novel biomarker for disease diagnosis and a target for therapy
    Dong, Rui
    Zhang, Min
    Hu, Qunying
    Zheng, Shan
    Soh, Andrew
    Zheng, Yijie
    Yuan, Hui
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR MEDICINE, 2018, 41 (02) : 599 - 614