A multi-task positive-unlabeled learning framework to predict secreted proteins in human body fluids

被引:1
作者
He, Kai [1 ]
Wang, Yan [1 ,2 ]
Xie, Xuping [1 ]
Shao, Dan [3 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[2] Jilin Univ, Sch Artificial Intelligence, Changchun 130012, Peoples R China
[3] Changchun Univ, Coll Comp Sci & Technol, Changchun 130022, Peoples R China
基金
中国国家自然科学基金;
关键词
Secreted protein discovery; Semi-supervised learning; Convolutional neural network; Multi-task learning; PLASMA PROTEOME DATABASE; CEREBROSPINAL-FLUID; WEB SERVER; BIOMARKERS; RESOURCE; DISCOVERY;
D O I
10.1007/s40747-023-01221-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Body fluid biomarkers are very important, because they can be detected in a non-invasive or minimally invasive way. The discovery of secreted proteins in human body fluids is an essential step toward proteomic biomarker identification for human diseases. Recently, many computational methods have been proposed to predict secreted proteins and achieved some success. However, most of them are based on a manual negative dataset, which is usually biased and therefore limits the prediction performances. In this paper, we first propose a novel positive-unlabeled learning framework to predict secreted proteins in a single body fluid. The secreted protein discovery in a single body fluid is transformed into multiple binary classifications and solved via multi-task learning. Also, an effective convolutional neural network is employed to reduce the overfitting problem. After that, we then improve this framework to predict secreted proteins in multiple body fluids simultaneously. The improved framework adopts a globally shared network to further improve the prediction performances of all body fluids. The improved framework was trained and evaluated on datasets of 17 body fluids, and the average benchmarks of 17 body fluids achieved an accuracy of 89.48%, F1 score of 56.17%, and PRAUC of 58.93%. The comparative results demonstrate that the improved framework performs much better than other state-of-the-art methods in secreted protein discovery.
引用
收藏
页码:1319 / 1331
页数:13
相关论文
共 49 条
[21]   DenSec: Secreted Protein Prediction in Cerebrospinal Fluid Based on DenseNet and Transformer [J].
Huang, Lan ;
Qu, Yanli ;
He, Kai ;
Wang, Yan ;
Shao, Dan .
MATHEMATICS, 2022, 10 (14)
[22]   Human body-fluid proteome: quantitative profiling and computational prediction [J].
Huang, Lan ;
Shao, Dan ;
Wang, Yan ;
Cui, Xueteng ;
Li, Yufei ;
Chen, Qian ;
Cui, Juan .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (01) :315-333
[23]   CD-HIT Suite: a web server for clustering and comparing biological sequences [J].
Huang, Ying ;
Niu, Beifang ;
Gao, Ying ;
Fu, Limin ;
Li, Weizhong .
BIOINFORMATICS, 2010, 26 (05) :680-682
[24]   POSITIVE AND UNLABELED LEARNING ALGORITHMS AND APPLICATIONS: A SURVEY [J].
Jackie, Kristen ;
Spanias, Andreas .
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA), 2019, :144-151
[25]   A Convolutional Neural Network for Modelling Sentences [J].
Kalchbrenner, Nal ;
Grefenstette, Edward ;
Blunsom, Phil .
PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, :655-665
[26]   Chemical Barrier Proteins in Human Body Fluids [J].
Kallo, Gergo ;
Kumar, Ajneesh ;
Tozser, Jozsef ;
Csosz, Eva .
BIOMEDICINES, 2022, 10 (07)
[27]  
Kingma Diederik P., 2015, INT C LEARN REPR OPE
[28]  
Lathrop JT, 2003, CURR OPIN MOL THER, V5, P250
[29]  
Li F., 2022, BRIEF BIOINFORM, V23, P13
[30]   Sys-BodyFluid: a systematical database for human body fluid proteome research [J].
Li, Su-Jun ;
Peng, Mao ;
Li, Hong ;
Liu, Bo-Shu ;
Wang, Chuan ;
Wu, Jia-Rui ;
Li, Yi-Xue ;
Zeng, Rong .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D907-D912