A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences

被引:2
作者
Tsarouchis, Sotirios-Filippos [1 ]
Kotouza, Maria Th [1 ]
Psomopoulos, Fotis E. [1 ,2 ]
Mitkas, Pericles A. [1 ]
机构
[1] Aristotle Univ Thessaloniki, Elect & Comp Engn, Thessaloniki 54124, Greece
[2] Ctr Res & Technol Hellas, Inst Appl Biosci, Thessaloniki 57001, Greece
来源
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018 | 2018年 / 520卷
关键词
Hierarchical clustering; Amino acid sequences; Sequence similarity; Sequence identity;
D O I
10.1007/978-3-319-92016-0_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. Here we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The proposed method utilizes the metrics of sequence identity and amino-acid similarity simultaneously as direct measures. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from Chronic lymphocytic leukemia (CLL) patients, showing promising results.
引用
收藏
页码:189 / 199
页数:11
相关论文
共 9 条
  • [1] [Anonymous], The international ImMunoGeneTics information system. 2022 Date
  • [2] Belacel N., 2010, CLUSTERING UNSUPERVI
  • [3] Berkhin P, 2006, GROUPING MULTIDIMENSIONAL DATA: RECENT ADVANCES IN CLUSTERING, P25
  • [4] ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time
    Cai, Yunpeng
    Zheng, Wei
    Yao, Jin
    Yang, Yujie
    Mai, Volker
    Mao, Qi
    Sun, Yijun
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (04)
  • [5] A different ontogenesis for chronic lymphocytic leukemia cases carrying stereotyped antigen receptors: molecular and computational evidence
    Darzentas, N.
    Hadzidimitriou, A.
    Murray, F.
    Hatzi, K.
    Josefsson, P.
    Laoutaris, N.
    Moreno, C.
    Anagnostopoulos, A.
    Jurlander, J.
    Tsaftaris, A.
    Chiorazzi, N.
    Belessi, C.
    Ghia, P.
    Rosenquist, R.
    Davi, F.
    Stamatopoulos, K.
    [J]. LEUKEMIA, 2010, 24 (01) : 125 - 132
  • [6] Search and clustering orders of magnitude faster than BLAST
    Edgar, Robert C.
    [J]. BIOINFORMATICS, 2010, 26 (19) : 2460 - 2461
  • [7] Machine learning in bioinformatics
    Larranaga, Pedro
    Calvo, Borja
    Santana, Roberto
    Bielza, Concha
    Galdiano, Josu
    Inza, Inaki
    Lozano, Jose A.
    Armananzas, Ruben
    Santafe, Guzman
    Perez, Aritz
    Robles, Victor
    [J]. BRIEFINGS IN BIOINFORMATICS, 2006, 7 (01) : 86 - 112
  • [8] Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
    Li, Weizhong
    Godzik, Adam
    [J]. BIOINFORMATICS, 2006, 22 (13) : 1658 - 1659
  • [9] Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm
    Rigoutsos, I
    Floratos, A
    [J]. BIOINFORMATICS, 1998, 14 (01) : 55 - 67