A multi-task positive-unlabeled learning framework to predict secreted proteins in human body fluids

被引:1
作者
He, Kai [1 ]
Wang, Yan [1 ,2 ]
Xie, Xuping [1 ]
Shao, Dan [3 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Sch Artificial Intelligence, Changchun 130012, Peoples R China
[3] Changchun Univ, Coll Comp Sci & Technol, Changchun 130022, Peoples R China
基金
中国国家自然科学基金;
关键词
Secreted protein discovery; Semi-supervised learning; Convolutional neural network; Multi-task learning; PLASMA PROTEOME DATABASE; CEREBROSPINAL-FLUID; WEB SERVER; BIOMARKERS; RESOURCE; DISCOVERY;
D O I
10.1007/s40747-023-01221-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Body fluid biomarkers are very important, because they can be detected in a non-invasive or minimally invasive way. The discovery of secreted proteins in human body fluids is an essential step toward proteomic biomarker identification for human diseases. Recently, many computational methods have been proposed to predict secreted proteins and achieved some success. However, most of them are based on a manual negative dataset, which is usually biased and therefore limits the prediction performances. In this paper, we first propose a novel positive-unlabeled learning framework to predict secreted proteins in a single body fluid. The secreted protein discovery in a single body fluid is transformed into multiple binary classifications and solved via multi-task learning. Also, an effective convolutional neural network is employed to reduce the overfitting problem. After that, we then improve this framework to predict secreted proteins in multiple body fluids simultaneously. The improved framework adopts a globally shared network to further improve the prediction performances of all body fluids. The improved framework was trained and evaluated on datasets of 17 body fluids, and the average benchmarks of 17 body fluids achieved an accuracy of 89.48%, F1 score of 56.17%, and PRAUC of 58.93%. The comparative results demonstrate that the improved framework performs much better than other state-of-the-art methods in secreted protein discovery.
引用
收藏
页码:1319 / 1331
页数:13
相关论文
共 49 条
[1]   Body fluid proteomics: Prospects for biomarker discovery [J].
Ahn, Sung-Min ;
Simpson, Richard J. .
PROTEOMICS CLINICAL APPLICATIONS, 2007, 1 (09) :1004-1015
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Determining Plasma Protein Variation Parameters as a Prerequisite for Biomarker Studies-A TMT-Based LC-MSMS Proteome Investigation [J].
Andersen, Lou-Ann C. ;
Palstrom, Nicolai Bjodstrup ;
Diederichsen, Axel ;
Lindholt, Jes Sanddal ;
Rasmussen, Lars Melholt ;
Beck, Hans Christian .
PROTEOMES, 2021, 9 (04)
[4]   The Clinical Plasma Proteome: A Survey of Clinical Assays for Proteins in Plasma and Serum [J].
Anderson, N. Leigh .
CLINICAL CHEMISTRY, 2010, 56 (02) :177-185
[5]   UniProt: a worldwide hub of protein knowledge [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Alpi, Emanuele ;
Bely, Benoit ;
Bingley, Mark ;
Britto, Ramona ;
Bursteinas, Borisas ;
Busiello, Gianluca ;
Bye-A-Jee, Hema ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Daniel ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Ignatchenko, Alexandr ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Nightingale, Andrew ;
Onwubiko, Joseph ;
Palka, Barbara ;
Pichler, Klemens ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Renaux, Alexandre ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Volynkin, Vladimir ;
Wardell, Tony .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D506-D515
[6]   Learning from positive and unlabeled data: a survey [J].
Bekker, Jessa ;
Davis, Jesse .
MACHINE LEARNING, 2020, 109 (04) :719-760
[7]   Plasma Protein Profiling by Proximity Extension Assay Technology Reveals Novel Biomarkers of Traumatic Brain Injury-A Pilot Study [J].
Chen, Michelle ;
Ren, Annie H. ;
Prassas, Ioannis ;
Soosaipillai, Antoninus ;
Lim, Bryant ;
Fraser, Douglas D. ;
Diamandis, Eleftherios P. .
JOURNAL OF APPLIED LABORATORY MEDICINE, 2021, 6 (05) :1165-1178
[8]   Quantitative body fluid proteomics in medicine - A focus on minimal invasiveness [J].
Csosz, Eva ;
Kallo, Gergo ;
Markus, Bernadett ;
Deak, Eszter ;
Csutak, Adrienne ;
Tozser, Jozsef .
JOURNAL OF PROTEOMICS, 2017, 153 :30-43
[9]   Computational prediction of human proteins that can be secreted into the bloodstream [J].
Cui, Juan ;
Liu, Qi ;
Puett, David ;
Xu, Ying .
BIOINFORMATICS, 2008, 24 (20) :2370-2375
[10]   Proteomics technologies for cancer liquid biopsies [J].
Ding, Zhiyong ;
Wang, Nan ;
Ji, Ning ;
Chen, Zhe-Sheng .
MOLECULAR CANCER, 2022, 21 (01)