iProX in 2021: connecting proteomics data sharing with big data

被引:546
作者
Chen, Tao [1 ]
Ma, Jie [1 ]
Liu, Yi [1 ]
Chen, Zhiguang [2 ]
Xiao, Nong [2 ]
Lu, Yutong [2 ]
Fu, Yinjin [2 ]
Yang, Chunyuan [1 ]
Li, Mansheng [1 ]
Wu, Songfeng [1 ]
Wang, Xue [1 ]
Li, Dongsheng [1 ]
He, Fuchu [1 ]
Hermjakob, Henning [1 ,3 ]
Zhu, Yunping [1 ,4 ]
机构
[1] Natl Ctr Prot Sci Beijing, Beijing Proteome Res Ctr, Beijing Inst Life, State Key Lab Prote, Beijing 102206, Peoples R China
[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 26469, Peoples R China
[3] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England
[4] Anhui Med Univ, Basic Med Sch, Hefei 230032, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1093/nar/gkab1081
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.
引用
收藏
页码:D1522 / D1527
页数:6
相关论文
共 20 条
[1]   The application of Hadoop in structural bioinformatics [J].
Alnasir, Jamie J. ;
Shanahan, Hugh P. .
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (01) :96-105
[2]   Data Management of Sensitive Human Proteomics Data: Current Practices, Recommendations, and Perspectives for the Future [J].
Bandeira, Nuno ;
Deutsch, Eric W. ;
Kohlbacher, Oliver ;
Martens, Lennart ;
Vizcaino, Juan Antonio .
MOLECULAR & CELLULAR PROTEOMICS, 2021, 20
[3]   The Encyclopedia of Proteome Dynamics: a big data ecosystem for (prote)omics [J].
Brenes, Alejandro ;
Afzal, Vackar ;
Kent, Robert ;
Lamond, Angus I. .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D1202-D1209
[4]   Universal Spectrum Identifier for mass spectra [J].
Deutsch, Eric W. ;
Perez-Riverol, Yasset ;
Carver, Jeremy ;
Kawano, Shin ;
Mendoza, Luis ;
Van Den Bossche, Tim ;
Gabriels, Ralf ;
Binz, Pierre-Alain ;
Pullman, Benjamin ;
Sun, Zhi ;
Shofstahl, Jim ;
Bittremieux, Wout ;
Mak, Tytus D. ;
Klein, Joshua ;
Zhu, Yunping ;
Lam, Henry ;
Vizcaino, Juan Antonio ;
Bandeira, Nuno .
NATURE METHODS, 2021, 18 (07) :768-+
[5]   The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics [J].
Deutsch, Eric W. ;
Bandeira, Nuno ;
Sharma, Vagisha ;
Perez-Riverol, Yasset ;
Carver, Jeremy J. ;
Kundu, Deepti J. ;
Garcia-Seisdedos, David ;
Jarnuczak, Andrew F. ;
Hewapathirana, Suresh ;
Pullman, Benjamin S. ;
Wertz, Julie ;
Sun, Zhi ;
Kawano, Shin ;
Okuda, Shujiro ;
Watanabe, Yu ;
Hermjakob, Henning ;
MacLean, Brendan ;
MacCoss, Michael J. ;
Zhu, Yunping ;
Ishihama, Yasushi ;
Vizcaino, Juan A. .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D1145-D1152
[6]   PASSEL: The PeptideAtlas SRM experiment library [J].
Farrah, Terry ;
Deutsch, Eric W. ;
Kreisberg, Richard ;
Sun, Zhi ;
Campbell, David S. ;
Mendoza, Luis ;
Kusebauch, Ulrike ;
Brusniak, Mi-Youn ;
Huettenhain, Ruth ;
Schiess, Ralph ;
Selevsek, Nathalie ;
Aebersold, Ruedi ;
Moritz, Robert L. .
PROTEOMICS, 2012, 12 (08) :1170-1175
[7]   Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma [J].
Jiang, Ying ;
Sun, Aihua ;
Zhao, Yang ;
Ying, Wantao ;
Sun, Huichuan ;
Yang, Xinrong ;
Xing, Baocai ;
Sun, Wei ;
Ren, Liangliang ;
Hu, Bo ;
Li, Chaoying ;
Zhang, Li ;
Qin, Guangrong ;
Zhang, Menghuan ;
Chen, Ning ;
Zhang, Manli ;
Huang, Yin ;
Zhou, Jinan ;
Zhao, Yan ;
Liu, Mingwei ;
Zhu, Xiaodong ;
Qiu, Yang ;
Sun, Yanjun ;
Huang, Cheng ;
Yan, Meng ;
Wang, Mingchao ;
Liu, Wei ;
Tian, Fang ;
Xu, Huali ;
Zhou, Jian ;
Wu, Zhenyu ;
Shi, Tieliu ;
Zhu, Weimin ;
Qin, Jun ;
Xie, Lu ;
Fan, Jia ;
Qian, Xiaohong ;
He, Fuchu ;
Zhu, Yunping ;
Wang, Yi ;
Yang, Dong ;
Liu, Wanlin ;
Liu, Qiongming ;
Yang, Xiaoming ;
Zhen, Bei ;
Wu, Zhenyu ;
Fan, Jia ;
Sun, Huichuan ;
Qian, Juying ;
Hong, Tao .
NATURE, 2019, 567 (7747) :257-+
[8]   The challenges of big data biology [J].
Leonelli, Sabina .
ELIFE, 2019, 8
[9]   Enabling Massive XML-Based Biological Data Management in HBase [J].
Liu, Jian ;
Liu, Qiuru ;
Zhang, Lei ;
Su, Shuhui ;
Liu, Yongzhuang .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (06) :1994-2004
[10]   iProX: an integrated proteome resource [J].
Ma, Jie ;
Chen, Tao ;
Wu, Songfeng ;
Yang, Chunyuan ;
Bai, Mingze ;
Shu, Kunxian ;
Li, Kenli ;
Zhang, Guoqing ;
Jin, Zhong ;
He, Fuchu ;
Hermjakob, Henning ;
Zhu, Yunping .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D1211-D1217