FuzzyPPI: Large-Scale Interaction of Human Proteome at Fuzzy Semantic Space

被引：0

作者：

Halder, Anup Kumar ^{[1
,2
]}

Bandyopadhyay, Soumyendu Sekhar ^{[3
,4
]}

Jedrzejewski, Witold ^{[5
]}

Basu, Subhadip ^{[3
]}

Sroka, Jacek ^{[5
]}

机构：

[1] Warsaw Univ Technol, Fac Math & Informat Sci, PL-00661 Warsaw, Poland

[2] Univ Warsaw, Ctr New Technol, PL-00927 Warsaw, Poland

[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, W Bengal, India

[4] Univ Engn & Management, Inst Engn & Management, Dept Informat Technol, Kolkata 700091, W Bengal, India

[5] Univ Warsaw, Inst Informat, PL-00927 Warsaw, Poland

来源：

IEEE TRANSACTIONS ON BIG DATA | 2025年 / 11卷 / 01期

关键词：

Proteins; Semantics; Annotations; Organisms; Benchmark testing; Databases; Ontologies; Protein-protein interaction; gene ontology; fuzzy semantic score; large-scale graph analysis; apache spark-based distributed computation; INTERACTION NETWORK; SIMILARITY; PREDICTION; DATABASE; PERFORMANCE; FAMILY;

D O I：

10.1109/TBDATA.2024.3375149

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large-scale protein-protein interaction (PPI) network of an organism provides key insights into its cellular and molecular functionalities, signaling pathways and underlying disease mechanisms. For any organism, the total unexplored protein interactions significantly outnumbers all known positive and negative interactions. For Human, all known PPI datasets contain only similar to 5.61 million positive and similar to 0.76 million negative interactions, which is similar to 3.1% of potential interactions. We have implemented a distributed algorithm in Apache Spark that evaluates a Human PPI network of similar to 180 million potential interactions resulting from 18 994 reviewed proteins for which Gene Ontology (GO) annotations are available. The computed scores have been validated against state-of-the-art methods on benchmark datasets. FuzzyPPI performed significantly better with an average F1 score of 0.62 compared to GOntoSim (0.39), GOGO (0.38), and Wang (0.38) when tested with the Gold Standard PPI Dataset. The resulting scores are published with a web server for non-commercial use at http://fuzzyppi.mimuw.edu.pl/. Moreover, conventional PPI prediction methods produce binary results, but in fact this is just a simplification as PPIs have strengths or probabilities and recent studies show that protein binding affinities may prove to be effective in detecting protein complexes, disease association analysis, signaling network reconstruction, etc. Keeping these in mind, our algorithm is based on a fuzzy semantic scoring function and produces probabilities of interaction.

引用

页码：47 / 58

页数：12

共 56 条

[1]

[Anonymous], 2015, Nucleic Acids Research, V43, pD204

[2]

Armbrust M, 2015, PROC VLDB ENDOW, V8, P1840

[3] Spark SQL: Relational Data Processing in Spark [J].

Armbrust, Michael ;

Xin, Reynold S. ;

Lian, Cheng ;

Huai, Yin ;

Liu, Davies ;

Bradley, Joseph K. ;

Meng, Xiangrui ;

Kaftan, Tomer ;

Franklint, Michael J. ;

Ghodsi, Ali ;

Zaharia, Matei .

SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1383-1394

[4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].

Bairoch, A ;

Apweiler, R .

NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48

[5] Assessment of GO-Based Protein Interaction Affinities in the Large-Scale Human-Coronavirus Family Interactome [J].

Bandyopadhyay, Soumyendu Sekhar ;

Halder, Anup Kumar ;

Saha, Sovan ;

Chatterjee, Piyali ;

Nasipuri, Mita ;

Basu, Subhadip .

VACCINES, 2023, 11 (03)

[6] PathFinder: mining signal transduction pathway segments from protein-protein interaction networks [J].

Bebek, Gurkan ;

Yang, Jiong .

BMC BIOINFORMATICS, 2007, 8 (1)

[7] Research-paper recommender systems: a literature survey [J].

Beel, Joeran ;

Gipp, Bela ;

Langer, Stefan ;

Breitinger, Corinna .

INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2016, 17 (04) :305-338

[8] Predicting protein associations with long noncoding RNAs [J].

Bellucci, Matteo ;

Agostini, Federico ;

Masin, Marianela ;

Tartaglia, Gian Gaetano .

NATURE METHODS, 2011, 8 (06) :444-445

[9] Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis [J].

Blohm, Philipp ;

Frishman, Goar ;

Smialowski, Pawel ;

Goebels, Florian ;

Wachinger, Benedikt ;

Ruepp, Andreas ;

Frishman, Dmitrij .

NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D396-D400

[10] The Gene Ontology Resource: 20 years and still GOing strong [J].

Carbon, S. ;

Douglass, E. ;

Dunn, N. ;

Good, B. ;

Harris, N. L. ;

Lewis, S. E. ;

Mungall, C. J. ;

Basu, S. ;

Chisholm, R. L. ;

Dodson, R. J. ;

Hartline, E. ;

Fey, P. ;

Thomas, P. D. ;

Albou, L. P. ;

Ebert, D. ;

Kesling, M. J. ;

Mi, H. ;

Muruganujian, A. ;

Huang, X. ;

Poudel, S. ;

Mushayahama, T. ;

Hu, J. C. ;

LaBonte, S. A. ;

Siegele, D. A. ;

Antonazzo, G. ;

Attrill, H. ;

Brown, N. H. ;

Fexova, S. ;

Garapati, P. ;

Jones, T. E. M. ;

Marygold, S. J. ;

Millburn, G. H. ;

Rey, A. J. ;

Trovisco, V. ;

dos Santos, G. ;

Emmert, D. B. ;

Falls, K. ;

Zhou, P. ;

Goodman, J. L. ;

Strelets, V. B. ;

Thurmond, J. ;

Courtot, M. ;

Osumi-Sutherland, D. ;

Parkinson, H. ;

Roncaglia, P. ;

Acencio, M. L. ;

Kuiper, M. ;

Laegreid, A. ;

Logie, C. ;

Lovering, R. C. .

NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D330-D338

← 1 2 3 4 5 6 →