A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce

被引:48
|
作者
Hu, Lun [1 ,5 ]
Yang, Shicheng [2 ]
Luo, Xin [1 ,3 ,4 ]
Yuan, Huaqiang [1 ]
Sedraoui, Khaled [6 ,7 ]
Zhou, MengChu [8 ]
机构
[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523808, Peoples R China
[2] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Hubei, Peoples R China
[3] Chongqing Inst Green & Intelligent Technol, Chongqing Engn Res Ctr Big Data Applicat Smart Ci, Chongqing 400714, Peoples R China
[4] Chongqing Inst Green & Intelligent Technol, Chongqing Key Lab Big Data & Intelligent Comp, Chinese Acad Sci, Chongqing 400714, Peoples R China
[5] Xinjiang Tech Inst Phys & Chem, Chinese Acad Sci, Urumqi 830000, Peoples R China
[6] King Abdulaziz Univ, Ctr Res Excellence Renewable Energy & Power Syst, Jeddah 21589, Saudi Arabia
[7] King Abdulaziz Univ, Dept Elect & Comp Engn, Fac Engn, Jeddah 21589, Saudi Arabia
[8] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
基金
中国国家自然科学基金;
关键词
Distributed computing; large-scale prediction machine learning; MapReduce; protein-protein interaction (PPI); GENE ORDER; NETWORK; ALGORITHM; INFERENCE; MODEL;
D O I
10.1109/JAS.2021.1004198
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins. With the rapid development of high-throughput genomic technologies, massive protein-protein interaction (PPI) data have been generated, making it very difficult to analyze them efficiently. To address this problem, this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms, i.e., CoFex, using MapReduce. To do so, an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction. Respective solutions are then devised to overcome these limitations. In particular, we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins. After that, its procedure is modified by following the MapReduce framework to take the prediction task distributively. A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy. Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.
引用
收藏
页码:160 / 172
页数:13
相关论文
共 50 条
  • [1] Predicting Large-scale Protein-protein Interactions by Extracting Coevolutionary Patterns with MapReduce Paradigm
    Hu, Lun
    Zhao, Bo-Wei
    Yang, Shicheng
    Luo, Xin
    Zhou, MengChu
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 939 - 944
  • [2] Efficiently predicting large-scale protein-protein interactions using MapReduce
    Hu, Lun
    Yuan, Xiaohui
    Hu, Pengwei
    Chan, Keith C. C.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2017, 69 : 202 - 206
  • [3] A MapReduce based parallel SVM for large-scale predicting protein-protein interactions
    You, Zhu-Hong
    Yu, Jian-Zhong
    Zhu, Lin
    Li, Shuai
    Wen, Zhen-Kun
    NEUROCOMPUTING, 2014, 145 : 37 - 43
  • [4] BioPlex Display: An Interactive Suite for Large-Scale AP-MS Protein-Protein Interaction Data
    Schweppe, Devin K.
    Huttlin, Edward L.
    Harper, J. Wade
    Gygi, Steven P.
    JOURNAL OF PROTEOME RESEARCH, 2018, 17 (01) : 722 - 726
  • [5] Large-Scale Multimedia Data Mining Using MapReduce Framework
    Wang, Hanli
    Shen, Yun
    Wang, Lei
    Zhufeng, Kuangtian
    Wang, Wei
    Cheng, Cheng
    2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2012,
  • [6] Using Topology Information for Protein-Protein Interaction Prediction
    Birlutiu, Adriana
    Heskes, Tom
    PATTERN RECOGNITION IN BIOINFORMATICS, PRIB 2014, 2014, 8626 : 10 - 22
  • [7] Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model
    You, Zhu-Hong
    Li, Shuai
    Gao, Xin
    Luo, Xin
    Ji, Zhen
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [8] Protein Complex Prediction in Large Ontology Attributed Protein-Protein Interaction Networks
    Zhang, Yijia
    Lin, Hongfei
    Yang, Zhihao
    Wang, Jian
    Li, Yanpeng
    Xu, Bo
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (03) : 729 - 741
  • [9] A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites
    Mou, Minjie
    Pan, Ziqi
    Zhou, Zhimeng
    Zheng, Lingyan
    Zhang, Hanyu
    Shi, Shuiyang
    Li, Fengcheng
    Sun, Xiuna
    Zhu, Feng
    RESEARCH, 2023, 6
  • [10] A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks
    Browne, Fiona
    Wang, Haiying
    Zheng, Huiru
    Azuaje, Francisco
    COMPUTERS IN BIOLOGY AND MEDICINE, 2010, 40 (03) : 306 - 317