Seq2Phase: language model-based accurate prediction of client proteins in liquid-liquid phase separation

被引:2
作者
Miyata, Kazuki [1 ,7 ]
Iwasaki, Wataru [1 ,2 ,3 ,4 ,5 ,6 ,7 ]
机构
[1] Univ Tokyo, Grad Sch Sci, Dept Biol Sci, Bunkyo Ku, Tokyo 1130032, Japan
[2] Univ Tokyo, Grad Sch Frontier Sci, Dept Integrated Biosci, Chiba 2770882, Japan
[3] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol & Med Sci, Chiba 2770882, Japan
[4] Univ Tokyo, Atmosphere & Ocean Res Inst, Kashiwa, Chiba 2770882, Japan
[5] Univ Tokyo, Inst Quantitat Biosci, Bunkyo Ku, Tokyo 1130032, Japan
[6] Univ Tokyo, Collaborat Res Inst Innovat Microbiol, Bunkyo Ku, Tokyo 1130032, Japan
[7] 5-1-5 Kashiwanoha, Kashiwa, Chiba 2770882, Japan
来源
BIOINFORMATICS ADVANCES | 2024年 / 4卷 / 01期
关键词
CD-HIT; TRANSITION; COMPLEXITY; DROPLETS; SIZE; LIFE;
D O I
10.1093/bioadv/vbad189
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Motivation Liquid-liquid phase separation (LLPS) enables compartmentalization in cells without biological membranes. LLPS plays essential roles in membraneless organelles such as nucleoli and p-bodies, helps regulate cellular physiology, and is linked to amyloid formation. Two types of proteins, scaffolds and clients, are involved in LLPS. However, computational methods for predicting LLPS client proteins from amino-acid sequences remain underdeveloped.Results Here, we present Seq2Phase, an accurate predictor of LLPS client proteins. Information-rich features are extracted from amino-acid sequences by a deep-learning technique, Transformer, and fed into supervised machine learning. Predicted client proteins contained known LLPS regulators and showed localization enrichment into membraneless organelles, confirming the validity of the prediction. Feature analysis revealed that scaffolds and clients have different sequence properties and that textbook knowledge of LLPS-related proteins is biased and incomplete. Seq2Phase achieved high accuracies across human, mouse, yeast, and plant, showing that the method is not overfitted to specific species and has broad applicability. We predict that more than hundreds or thousands of LLPS client proteins remain undiscovered in each species and that Seq2Phase will advance our understanding of still enigmatic molecular and physiological bases of LLPS as well as its roles in disease.Availability and implementation The software codes in Python underlying this article are available at https://github.com/IwasakiLab/Seq2Phase.
引用
收藏
页数:11
相关论文
共 47 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   UniProt: the Universal Protein Knowledgebase in 2023 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Cukura, Austra ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Garmiri, Penelope ;
Gonzales, Leonardo Jose da Costa ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Kandasaamy, Swaathi ;
Lock, Antonia ;
Luciani, Aurelien ;
Lugaric, Marija ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Raposo, Pedro ;
Rice, Daniel L. ;
Saidi, Rabie ;
Santos, Rafael ;
Speretta, Elena ;
Stephenson, James ;
Totoo, Prabhat ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Warner, Kate ;
Watkins, Xavier ;
Zellner, Hermann .
NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) :D523-D531
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   A Concentration-Dependent Liquid Phase Separation Can Cause Toxicity upon Increased Protein Expression [J].
Bolognesi, Benedetta ;
Lorenzo Gotor, Nieves ;
Dhar, Riddhiman ;
Cirillo, Davide ;
Baldrighi, Marta ;
Gaetano Tartaglia, Gian ;
Lehner, Ben .
CELL REPORTS, 2016, 16 (01) :222-231
[5]   Active liquid-like behavior of nucleoli determines their size and shape in Xenopus laevis oocytes [J].
Brangwynne, Clifford P. ;
Mitchison, Timothy J. ;
Hyman, Anthony A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (11) :4334-4339
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Sensitive protein alignments at tree-of-life scale using DIAMOND [J].
Buchfink, Benjamin ;
Reuter, Klaus ;
Drost, Hajk-Georg .
NATURE METHODS, 2021, 18 (04) :366-+
[8]   The Gene Ontology resource: enriching a GOld mine [J].
Carbon, Seth ;
Douglass, Eric ;
Good, Benjamin M. ;
Unni, Deepak R. ;
Harris, Nomi L. ;
Mungall, Christopher J. ;
Basu, Siddartha ;
Chisholm, Rex L. ;
Dodson, Robert J. ;
Hartline, Eric ;
Fey, Petra ;
Thomas, Paul D. ;
Albou, Laurent-Philippe ;
Ebert, Dustin ;
Kesling, Michael J. ;
Mi, Huaiyu ;
Muruganujan, Anushya ;
Huang, Xiaosong ;
Mushayahama, Tremayne ;
LaBonte, Sandra A. ;
Siegele, Deborah A. ;
Antonazzo, Giulia ;
Attrill, Helen ;
Brown, Nick H. ;
Garapati, Phani ;
Marygold, Steven J. ;
Trovisco, Vitor ;
Dos Santos, Gil ;
Falls, Kathleen ;
Tabone, Christopher ;
Zhou, Pinglei ;
Goodman, Joshua L. ;
Strelets, Victor B. ;
Thurmond, Jim ;
Garmiri, Penelope ;
Ishtiaq, Rizwan ;
Rodriguez-Lopez, Milagros ;
Acencio, Marcio L. ;
Kuiper, Martin ;
Laegreid, Astrid ;
Logie, Colin ;
Lovering, Ruth C. ;
Kramarz, Barbara ;
Saverimuttu, Shirin C. C. ;
Pinheiro, Sandra M. ;
Gunn, Heather ;
Su, Renzhi ;
Thurlow, Katherine E. ;
Chibucos, Marcus ;
Giglio, Michelle .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D325-D334
[9]   Screening membraneless organelle participants with machine-learning models that integrate multimodal features [J].
Chen, Zhaoming ;
Hou, Chao ;
Wang, Liang ;
Yu, Chunyu ;
Chen, Taoyu ;
Shen, Boyan ;
Hou, Yaoyao ;
Li, Pilong ;
Li, Tingting .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (24)
[10]   Prediction of liquid-liquid phase separating proteins using machine learning [J].
Chu, Xiaoquan ;
Sun, Tanlin ;
Li, Qian ;
Xu, Youjun ;
Zhang, Zhuqing ;
Lai, Luhua ;
Pei, Jianfeng .
BMC BIOINFORMATICS, 2022, 23 (01)