Insights from analyses of low complexity regions with canonical methods for protein sequence comparison

被引:10
|
作者
Jarnot, Patryk [1 ]
Ziemska-Legiecka, Joanna [2 ]
Grynberg, Marcin [2 ]
Gruca, Aleksandra [1 ]
机构
[1] Silesian Tech Univ, Dept Comp Networks & Syst, Akad 2A, PL-44100 Gliwice, Poland
[2] PAS, Inst Biochem & Biophys, Warsaw, Poland
关键词
comparison methods; low complexity regions; protein sequence similarity; GLYCERALDEHYDE-3-PHOSPHATE DEHYDROGENASE; IDENTIFICATION; ALIGNMENT; PEPTIDES; MATRICES; PHOSPHOPROTEIN; REPEATS; GENE;
D O I
10.1093/bib/bbac299
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions.
引用
收藏
页数:13
相关论文
共 50 条
  • [2] Relationship of Sequence and Phase Separation in Protein Low Complexity Regions
    Martin, Erik W.
    Mittag, Tanja
    BIOCHEMISTRY, 2018, 57 (17) : 2478 - 2487
  • [3] A Novel algorithm for identifying low-complexity regions in a protein sequence
    Li, Xuehui
    Kahveci, Tamer
    BIOINFORMATICS, 2006, 22 (24) : 2980 - 2987
  • [4] Terminal regions of a protein are a hotspot for low complexity regions and selection
    Teekas, Lokdeep
    Sharma, Sandhya
    Vijay, Nagarjun
    OPEN BIOLOGY, 2024, 14 (06)
  • [5] Unravelling the relationship between protein sequence and low-complexity regions entropies: Interactome implications
    Martins, F.
    Goncalves, R.
    Oliveira, J.
    Cruz-Monteagudo, M.
    Nieto-Villar, J. M.
    Paz-y-Mino, C.
    Rebelo, I.
    Tejera, E.
    JOURNAL OF THEORETICAL BIOLOGY, 2015, 382 : 320 - 327
  • [6] Sequence Determines the Switch in the Fibril Forming Regions in the Low-Complexity FUS Protein and Its Variants
    Kumar, Abhinaw
    Chakraborty, Debayan
    Mugnai, Mauro Lorenzo
    Straub, John E.
    Thirumalai, D.
    JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2021, 12 (37): : 9026 - 9032
  • [7] Comparison of sequence masking algorithms and the detection of biased protein sequence regions
    Kreil, DP
    Ouzounis, CA
    BIOINFORMATICS, 2003, 19 (13) : 1672 - 1681
  • [8] Molecular Mechanisms of Low Complexity Sequence Protein Assembly
    Wittmer, Yuuki
    Fonda, Blake
    Stowell, Rachelle
    Boulos, Natalie
    Rafique, Rebecca
    Hu, Rong
    Truc Le
    Murray, Dylan T.
    BIOPHYSICAL JOURNAL, 2020, 118 (03) : 214A - 215A
  • [9] PROTEIN-SEQUENCE COMPARISON - METHODS AND SIGNIFICANCE
    ARGOS, P
    VINGRON, M
    VOGT, G
    PROTEIN ENGINEERING, 1991, 4 (04): : 375 - 383
  • [10] How do protein domains of low sequence complexity work?
    Kato, Masato
    Zhou, Xiaoming
    McKnight, Steven L.
    RNA, 2022, 28 (01) : 3 - 15