Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique

被引:3
作者
Guzzi, Pietro Hiram [1 ]
Veltri, Pierangelo [1 ]
Cannataro, Mario [1 ]
机构
[1] Magna Graecia Univ Catanzaro, Dept Med & Surg Sci, I-88100 Catanzaro, Italy
来源
NEW FRONTIERS IN MINING COMPLEX PATTERNS, NFMCP 2013 | 2014年 / 8399卷
关键词
Semantic similarity measures; Semantic similarity networks; ONTOLOGY;
D O I
10.1007/978-3-319-08407-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO) and the most common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyse SSNs, such as protein complexes prediction or motif extraction, may reveal clusters of functionally associated proteins. Because SSNs have a high number of arcs with low weight, likened to noise, the application of classical clustering algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus we propose a novel preprocessing strategy to simplify SSNs based on an hybrid global-local thresholding approach based on spectral graph theory. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.
引用
收藏
页码:201 / 213
页数:13
相关论文
共 33 条
  • [1] Prediction of human disease genes by human-mouse conserved coexpression analysis
    Ala, Ugo
    Piro, Rosario Michael
    Grassi, Elena
    Damasco, Christian
    Silengo, Lorenzo
    Oti, Martin
    Provero, Paolo
    Di Cunto, Ferdinando
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (03)
  • [2] An automated method for finding molecular complexes in large protein interaction networks
    Bader, GD
    Hogue, CW
    [J]. BMC BIOINFORMATICS, 2003, 4 (1)
  • [3] On the functional and structural characterization of hubs in protein-protein interaction networks
    Bertolazzi, Paola
    Bock, Mary Ellen
    Guerra, Concettina
    [J]. BIOTECHNOLOGY ADVANCES, 2013, 31 (02) : 274 - 286
  • [4] Superparamagnetic clustering of data
    Blatt, M
    Wiseman, S
    Domany, E
    [J]. PHYSICAL REVIEW LETTERS, 1996, 76 (18) : 3251 - 3254
  • [5] SPECTRA AND OPTIMAL PARTITIONS OF WEIGHTED GRAPHS
    BOLLA, M
    TUSNADY, G
    [J]. DISCRETE MATHEMATICS, 1994, 128 (1-3) : 1 - 20
  • [6] Evaluation of clustering algorithms for protein-protein interaction networks
    Brohee, Sylvain
    van Helden, Jacques
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [7] The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
    Camon, E
    Magrane, M
    Barrell, D
    Lee, V
    Dimmer, E
    Maslen, J
    Binns, D
    Harte, N
    Lopez, R
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D262 - D266
  • [8] Data mining and life sciences applications on the grid
    Cannataro, Mario
    Guzzi, Pietro Hiram
    Sarica, Alessia
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (03) : 216 - 238
  • [9] Protein-to-Protein Interactions: Technologies, Databases, and Algorithms
    Cannataro, Mario
    Guzzi, Pietro H.
    Veltri, Pierangelo
    [J]. ACM COMPUTING SURVEYS, 2010, 43 (01)
  • [10] Chung F., 1994, REGIONAL C SERIES MA, V92