Prediction of polyreactive and nonspecific single-chain fragment variables through structural biochemical features and protein language-based descriptors

被引:6
作者
Lim, Hocheol [1 ,2 ]
No, Kyoung Tai [1 ,2 ,3 ]
机构
[1] Yonsei Univ, Interdisciplinary Grad Program Integrat Biotechnol, Incheon 21983, South Korea
[2] Bioinformat & Mol Design Res Ctr BMDRC, Incheon 21983, South Korea
[3] Baobab AiBIO Co Ltd, Incheon 21983, South Korea
关键词
Antibody design; Nonspecificity; Polyreactivity; Single-chain fragment variable; Machine Learning; Artificial Intelligence; HYDROPHOBIC INTERACTION CHROMATOGRAPHY; AGGREGATION; ANTIBODIES; RETENTION; STABILITY; POINT; TOOL;
D O I
10.1186/s12859-022-05010-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Monoclonal antibodies (mAbs) have been used as therapeutic agents, which must overcome many developability issues after the discovery from in vitro display libraries. Especially, polyreactive mAbs can strongly bind to a specific target and weakly bind to off-target proteins, which leads to poor antibody pharmacokinetics in clinical development. Although early assessment of polyreactive mAbs is important in the early discovery stage, experimental assessments are usually time-consuming and expensive. Therefore, computational approaches for predicting the polyreactivity of single-chain fragment variables (scFvs) in the early discovery stage would be promising for reducing experimental efforts. Results: Here, we made prediction models for the polyreactivity of scFvs with the known polyreactive antibody features and natural language model descriptors. We predicted 19,426 protein structures of scFvs with trRosetta to calculate the polyreactive antibody features and investigated the classifying performance of each factor for polyreactivity. In the known polyreactive features, the net charge of the CDR2 loop, the tryptophan and glycine residues in CDR-H3, and the lengths of the CDR1 and CDR2 loops, importantly contributed to the performance of the models. Additionally, the hydrodynamic features, such as partial specific volume, gyration radius, and isoelectric points of CDR loops and scFvs, were newly added to improve model performance. Finally, we made the prediction model with a robust performance ( AUC = 0.840) with an ensemble learning of the top 3 best models. Conclusion: The prediction models for polyreactivity would help assess polyreactive scFvs in the early discovery stage and our approaches would be promising to develop machine learning models with quantitative data from high throughput assays for screening.
引用
收藏
页数:19
相关论文
共 65 条
  • [1] Unified rational protein engineering with sequence-based deep representation learning
    Alley, Ethan C.
    Khimulya, Grigory
    Biswas, Surojit
    AlQuraishi, Mohammed
    Church, George M.
    [J]. NATURE METHODS, 2019, 16 (12) : 1315 - +
  • [2] Arik SO, 2021, AAAI CONF ARTIF INTE, V35, P6679
  • [3] Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
    Audain, Enrique
    Ramos, Yassel
    Hermjakob, Henning
    Flower, Darren R.
    Perez-Riverol, Yasset
    [J]. BIOINFORMATICS, 2016, 32 (06) : 821 - 827
  • [4] Accurate prediction of protein structures and interactions using a three-track neural network
    Baek, Minkyung
    DiMaio, Frank
    Anishchenko, Ivan
    Dauparas, Justas
    Ovchinnikov, Sergey
    Lee, Gyu Rie
    Wang, Jue
    Cong, Qian
    Kinch, Lisa N.
    Schaeffer, R. Dustin
    Millan, Claudia
    Park, Hahnbeom
    Adams, Carson
    Glassman, Caleb R.
    DeGiovanni, Andy
    Pereira, Jose H.
    Rodrigues, Andria V.
    van Dijk, Alberdina A.
    Ebrecht, Ana C.
    Opperman, Diederik J.
    Sagmeister, Theo
    Buhlheller, Christoph
    Pavkov-Keller, Tea
    Rathinaswamy, Manoj K.
    Dalwadi, Udit
    Yip, Calvin K.
    Burke, John E.
    Garcia, K. Christopher
    Grishin, Nick V.
    Adams, Paul D.
    Read, Randy J.
    Baker, David
    [J]. SCIENCE, 2021, 373 (6557) : 871 - +
  • [5] Biochemical patterns of antibody polyreactivity revealed through a bioinformatics-based analysis of CDR loops
    Boughter, Christopher T.
    Borowska, Marta T.
    Guthmiller, Jenna J.
    Bendelac, Albert
    Wilson, Patrick C.
    Roux, Benoit
    Adams, Erin J.
    [J]. ELIFE, 2020, 9 : 1 - 47
  • [6] Brownlee J., MACHINE LEARNING MAS
  • [7] FoldX as Protein Engineering Tool: Better Than Random Based Approaches?
    Buss, Oliver
    Rudat, Jens
    Ochsenreither, Katrin
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2018, 16 : 25 - 33
  • [8] DeepDDG: Predicting the Stability Change of Protein Point Mutations Using Neural Networks
    Cao, Huali
    Wang, Jingxue
    He, Liping
    Qi, Yifei
    Zhang, John Z.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (04) : 1508 - 1514
  • [9] m5CPred-SVM: a novel method for predicting m5C sites of RNA
    Chen, Xiao
    Xiong, Yi
    Liu, Yinbo
    Chen, Yuqing
    Bi, Shoudong
    Zhu, Xiaolei
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [10] Prediction of Aggregation Prone Regions of Therapeutic Proteins
    Chennamsetty, Naresh
    Voynov, Vladimir
    Kayser, Veysel
    Helk, Bernhard
    Trout, Bernhardt L.
    [J]. JOURNAL OF PHYSICAL CHEMISTRY B, 2010, 114 (19) : 6614 - 6624