Efficiently predicting large-scale protein-protein interactions using MapReduce

被引:10
作者
Hu, Lun [1 ]
Yuan, Xiaohui [1 ]
Hu, Pengwei [2 ]
Chan, Keith C. C. [2 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan, Hubei, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Large-scale protein-protein interactions; Prediction; MapReduce; Efficiency; DATABASE; UPDATE;
D O I
10.1016/j.compbiolchem.2017.03.009
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
With a rapid development of high-throughput genomic technologies, a vast amount of protein-protein interactions (PPIs) data has been generated for difference species. However, such set of PPIs is rather small when compared with all possible PPIs. Hence, there is a necessity to specifically develop computational algorithms for large-scale PPI prediction. In response to this need, we propose a parallel algorithm, namely pVLASPD, to perform the prediction task in a distributed manner. In particular, pVLASPD was modified based on the VLASPD algorithm for the purpose of improving the efficiency of VLASPD while maintaining a comparable effectiveness. To do so, we first analyzed VLASPD step by step to identify the places that caused the bottlenecks of efficiency. After that, pVLASPD was developed by parallelizing those inefficient places with the framework of MapReduce. The extensive experimental results demonstrate the promising performance of pVLASPD when applied to prediction of large-scale PPIs. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:202 / 206
页数:5
相关论文
共 13 条
[1]  
[Anonymous], 2012, Hadoop: The definitive guide
[2]   Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[3]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113
[4]   Discovering Variable-Length Patterns in Protein Sequences for Protein-Protein Interaction Prediction [J].
Hu, Lun ;
Chan, Keith C. C. .
IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2015, 14 (04) :409-416
[5]   Predicting protein-protein interactions using signature products [J].
Martin, S ;
Roe, D ;
Faulon, JL .
BIOINFORMATICS, 2005, 21 (02) :218-226
[6]   Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences [J].
Park, Yungki .
BMC BIOINFORMATICS, 2009, 10 :419
[7]   Assembly of cell regulatory systems through protein interaction domains [J].
Pawson, T ;
Nash, P .
SCIENCE, 2003, 300 (5618) :445-452
[8]   PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs [J].
Pitre, Sylvain ;
Dehne, Frank ;
Chan, Albert ;
Cheetham, Jim ;
Duong, Alex ;
Emili, Andrew ;
Gebbia, Marinella ;
Greenblatt, Jack ;
Jessulat, Mathew ;
Krogan, Nevan ;
Luo, Xuemei ;
Golshani, Ashkan .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Human Protein Reference Database-2009 update [J].
Prasad, T. S. Keshava ;
Goel, Renu ;
Kandasamy, Kumaran ;
Keerthikumar, Shivakumar ;
Kumar, Sameer ;
Mathivanan, Suresh ;
Telikicherla, Deepthi ;
Raju, Rajesh ;
Shafreen, Beema ;
Venugopal, Abhilash ;
Balakrishnan, Lavanya ;
Marimuthu, Arivusudar ;
Banerjee, Sutopa ;
Somanathan, Devi S. ;
Sebastian, Aimy ;
Rani, Sandhya ;
Ray, Somak ;
Kishore, C. J. Harrys ;
Kanth, Sashi ;
Ahmed, Mukhtar ;
Kashyap, Manoj K. ;
Mohmood, Riaz ;
Ramachandra, Y. L. ;
Krishna, V. ;
Rahiman, B. Abdul ;
Mohan, Sujatha ;
Ranganathan, Prathibha ;
Ramabadran, Subhashri ;
Chaerkady, Raghothama ;
Pandey, Akhilesh .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D767-D772
[10]   NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins [J].
Pruitt, KD ;
Tatusova, T ;
Maglott, DR .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D501-D504