Thresholding of Semantic Similarity Networks Using a Spectral Graph-Based Technique

被引：3

作者：

Guzzi, Pietro Hiram ^{[1
]}

Veltri, Pierangelo ^{[1
]}

Cannataro, Mario ^{[1
]}

机构：

[1] Magna Graecia Univ Catanzaro, Dept Med & Surg Sci, I-88100 Catanzaro, Italy

来源：

NEW FRONTIERS IN MINING COMPLEX PATTERNS, NFMCP 2013 | 2014年 / 8399卷

关键词：

Semantic similarity measures; Semantic similarity networks; ONTOLOGY;

D O I：

10.1007/978-3-319-08407-7_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO) and the most common application is to find the similarity or dissimilarity among two entities through the application of SSMs to their annotations. More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. Community detection algorithms that analyse SSNs, such as protein complexes prediction or motif extraction, may reveal clusters of functionally associated proteins. Because SSNs have a high number of arcs with low weight, likened to noise, the application of classical clustering algorithms on raw networks exhibits low performance. To improve the performance of such algorithms, a possible approach is to simplify the structure of SSNs through a preprocessing step able to delete arcs likened to noise. Thus we propose a novel preprocessing strategy to simplify SSNs based on an hybrid global-local thresholding approach based on spectral graph theory. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.

引用

页码：201 / 213

页数：13

共 33 条

[1] Prediction of human disease genes by human-mouse conserved coexpression analysis
Ala, Ugo
Piro, Rosario Michael
Grassi, Elena
Damasco, Christian
Silengo, Lorenzo
Oti, Martin
Provero, Paolo
Di Cunto, Ferdinando
[J]. PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (03)
[2] An automated method for finding molecular complexes in large protein interaction networks
Bader, GD
Hogue, CW
[J]. BMC BIOINFORMATICS, 2003, 4 (1)
[3] On the functional and structural characterization of hubs in protein-protein interaction networks
Bertolazzi, Paola
Bock, Mary Ellen
Guerra, Concettina
[J]. BIOTECHNOLOGY ADVANCES, 2013, 31 (02) : 274 - 286
[4] Superparamagnetic clustering of data
Blatt, M
Wiseman, S
Domany, E
[J]. PHYSICAL REVIEW LETTERS, 1996, 76 (18) : 3251 - 3254
[5] SPECTRA AND OPTIMAL PARTITIONS OF WEIGHTED GRAPHS
BOLLA, M
TUSNADY, G
[J]. DISCRETE MATHEMATICS, 1994, 128 (1-3) : 1 - 20
[6] Evaluation of clustering algorithms for protein-protein interaction networks
Brohee, Sylvain
van Helden, Jacques
[J]. BMC BIOINFORMATICS, 2006, 7 (1)
[7] The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
Camon, E
Magrane, M
Barrell, D
Lee, V
Dimmer, E
Maslen, J
Binns, D
Harte, N
Lopez, R
Apweiler, R
[J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D262 - D266
[8] Data mining and life sciences applications on the grid
Cannataro, Mario
Guzzi, Pietro Hiram
Sarica, Alessia
[J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (03) : 216 - 238
[9] Protein-to-Protein Interactions: Technologies, Databases, and Algorithms
Cannataro, Mario
Guzzi, Pietro H.
Veltri, Pierangelo
[J]. ACM COMPUTING SURVEYS, 2010, 43 (01)
[10] Chung F., 1994, REGIONAL C SERIES MA, V92

← 1 2 3 4 →