Positive-unlabeled learning in bioinformatics and computational biology: a brief review

被引:41
作者
Li, Fuyi [1 ]
Dong, Shuangyu [2 ]
Leier, Andre [3 ,4 ,5 ]
Han, Meiya [6 ]
Guo, Xudong
Xu, Jing [6 ,7 ]
Wang, Xiaoyu [6 ,7 ]
Pan, Shirui [8 ,9 ]
Jia, Cangzhi [10 ]
Zhang, Yang [11 ]
Webb, Geoffrey, I [12 ,13 ]
Coin, Lachlan J. M. [14 ,15 ]
Li, Chen [6 ,7 ]
Song, Jiangning [16 ,17 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Melbourne, Vic, Australia
[2] Univ Melbourne, Dept Elect & Elect Engn, Melbourne, Vic, Australia
[3] UAB Sch Med, Dept Genet, Birmingham, AL USA
[4] UABs ONeal Comprehens Canc Ctr, Birmingham, AL USA
[5] Gregory Fleming James Cyst Fibrosis Res Ctr, Birmingham, AL USA
[6] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[7] Monash Univ, Biomed Discovery Inst, Melbourne, Vic, Australia
[8] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[9] Univ Technol Sydney, Sch Software, Sydney, NSW, Australia
[10] Dalian Maritime Univ, Coll Sci, Dalian, Peoples R China
[11] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[12] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia
[13] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[14] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[15] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia
[16] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
[17] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia
基金
澳大利亚国家健康与医学研究理事会; 美国国家卫生研究院; 澳大利亚研究理事会; 英国医学研究理事会;
关键词
positive unlabeled learning; semi-supervised learning; machine learning; bioinformatics; pattern recognition; PROTEIN FUNCTION; PREDICTION; INTEGRATION; SEQUENCE; SITES; PROMOTERS; NETWORKS;
D O I
10.1093/bib/bbab461
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Positive-Unlabeled Learning for Cell Detection in Histopathology Images with Incomplete Annotations
    Zhao, Zipei
    Pang, Fengqian
    Liu, Zhiwen
    Ye, Chuyang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 509 - 518
  • [32] Foundations for improved vaccine correlate of risk analysis using positive-unlabeled learning
    Kelkar, Natasha S.
    Morrison, Kyle S.
    Ackerman, Margaret E.
    HUMAN VACCINES & IMMUNOTHERAPEUTICS, 2023, 19 (01)
  • [33] Leveraging Positive-Unlabeled Learning for Enhanced Black Spot Accident Identification on Greek Road Networks
    Sevetlidis, Vasileios
    Pavlidis, George
    Mouroutsos, Spyridon G.
    Gasteratos, Antonios
    COMPUTERS, 2024, 13 (02)
  • [34] Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning
    Li, Zhenfeng
    Hu, Lun
    Tang, Zehai
    Zhao, Cheng
    FRONTIERS IN GENETICS, 2021, 12
  • [35] Positive-unlabeled learning identifies vaccine candidate antigens in the malaria parasite Plasmodium falciparum
    Chou, Renee Ti
    Ouattara, Amed
    Adams, Matthew
    Berry, Andrea A.
    Takala-Harrison, Shannon
    Cummings, Michael P.
    NPJ SYSTEMS BIOLOGY AND APPLICATIONS, 2024, 10 (01)
  • [36] Split-PU: Hardness-aware Training Strategy for Positive-Unlabeled Learning
    Xu, Chengming
    Liu, Chen
    Yang, Siqian
    Wang, Yabiao
    Zhang, Shijie
    Jia, Lijie
    Fu, Yanwei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2719 - 2729
  • [37] Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction
    Zhu, Zhangchi
    Wang, Lu
    Zhao, Pu
    Du, Chao
    Zhang, Wei
    Dong, Hang
    Qiao, Bo
    Lin, Qingwei
    Rajmohan, Saravan
    Zhang, Dongmei
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3663 - 3673
  • [38] Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data
    Yang, Pengyi
    Humphrey, Sean J.
    James, David E.
    Yang, Yee Hwa
    Jothi, Raja
    BIOINFORMATICS, 2016, 32 (02) : 252 - 259
  • [39] Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification
    Jowkar, Gholam-Hossein
    Mansoori, Eghbal G.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2016, 64 : 263 - 270
  • [40] NIAPU: network-informed adaptive positive-unlabeled learning for disease gene identification
    Stolfi, Paola
    Mastropietro, Andrea
    Pasculli, Giuseppe
    Tieri, Paolo
    Vergni, Davide
    BIOINFORMATICS, 2023, 39 (02)