A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce

被引:1
|
作者
Lun Hu [1 ,2 ,3 ]
Shicheng Yang [4 ]
Xin Luo [1 ,2 ,5 ]
Huaqiang Yuan [2 ]
Khaled Sedraoui [6 ]
MengChu Zhou [1 ,7 ]
机构
[1] IEEE
[2] the School of Computer Science and Technology, Dongguan University of Technology
[3] Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences
[4] the School of Computer Science and Technology, Wuhan University of Technology
[5] the Chongqing Engineering Research Center of Big Data Application for Smart Cities, and Chongqing Key Laboratory of Big Data and Intelligent Computing, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences
[6] the Center of Research Excellence in Renewable Energy and Power Systems, and the Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University
[7] the Department of Electrical and Computer Engineering, New Jersey Institute of Technology
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
Q811.4 [生物信息论]; TP181 [自动推理、机器学习];
学科分类号
0711 ; 0831 ;
摘要
Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins.With the rapid development of high-throughput genomic technologies, massive protein-protein interaction(PPI) data have been generated, making it very difficult to analyze them efficiently. To address this problem, this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms, i.e., Co Fex, using Map Reduce. To do so, an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for largescale PPI data analysis and prediction. Respective solutions are then devised to overcome these limitations. In particular, we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins. After that, its procedure is modified by following the MapReduce framework to take the prediction task distributively.A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy. Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.
引用
收藏
页码:160 / 172
页数:13
相关论文
共 50 条
  • [1] A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce
    Hu, Lun
    Yang, Shicheng
    Luo, Xin
    Yuan, Huaqiang
    Sedraoui, Khaled
    Zhou, MengChu
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (01) : 160 - 172
  • [2] Analysis and application of large-scale protein-protein interaction data sets
    Sun, JC
    Xu, JL
    Li, YX
    Shi, TL
    CHINESE SCIENCE BULLETIN, 2005, 50 (20): : 2267 - 2272
  • [3] Analysis and application of large-scale protein-protein interaction data sets
    SUN Jingchun1
    2. Bioinformation Center
    ChineseScienceBulletin, 2005, (20) : 13 - 18
  • [4] Efficiently predicting large-scale protein-protein interactions using MapReduce
    Hu, Lun
    Yuan, Xiaohui
    Hu, Pengwei
    Chan, Keith C. C.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2017, 69 : 202 - 206
  • [5] Large-scale Protein-Protein Interaction prediction using novel kernel methods
    Chen, Xue-wen
    Han, Bing
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2008, 2 (02) : 145 - 156
  • [6] A MapReduce based parallel SVM for large-scale predicting protein-protein interactions
    You, Zhu-Hong
    Yu, Jian-Zhong
    Zhu, Lin
    Li, Shuai
    Wen, Zhen-Kun
    NEUROCOMPUTING, 2014, 145 : 37 - 43
  • [7] Large-scale prediction of protein-protein interactions from structures
    Martial Hue
    Michael Riffle
    Jean-Philippe Vert
    William S Noble
    BMC Bioinformatics, 11
  • [8] Large-scale prediction of protein-protein interactions from structures
    Hue, Martial
    Riffle, Michael
    Vert, Jean-Philippe
    Noble, William S.
    BMC BIOINFORMATICS, 2010, 11
  • [9] Predicting Large-scale Protein-protein Interactions by Extracting Coevolutionary Patterns with MapReduce Paradigm
    Hu, Lun
    Zhao, Bo-Wei
    Yang, Shicheng
    Luo, Xin
    Zhou, MengChu
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 939 - 944
  • [10] Prediction of protein function using protein-protein interaction data
    Deng, MH
    Zhang, K
    Mehta, S
    Chen, T
    Sun, FZ
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) : 947 - 960