Positive-unlabeled learning in bioinformatics and computational biology: a brief review

被引:41
作者
Li, Fuyi [1 ]
Dong, Shuangyu [2 ]
Leier, Andre [3 ,4 ,5 ]
Han, Meiya [6 ]
Guo, Xudong
Xu, Jing [6 ,7 ]
Wang, Xiaoyu [6 ,7 ]
Pan, Shirui [8 ,9 ]
Jia, Cangzhi [10 ]
Zhang, Yang [11 ]
Webb, Geoffrey, I [12 ,13 ]
Coin, Lachlan J. M. [14 ,15 ]
Li, Chen [6 ,7 ]
Song, Jiangning [16 ,17 ]
机构
[1] Univ Melbourne, Peter Doherty Inst Infect & Immun, Melbourne, Vic, Australia
[2] Univ Melbourne, Dept Elect & Elect Engn, Melbourne, Vic, Australia
[3] UAB Sch Med, Dept Genet, Birmingham, AL USA
[4] UABs ONeal Comprehens Canc Ctr, Birmingham, AL USA
[5] Gregory Fleming James Cyst Fibrosis Res Ctr, Birmingham, AL USA
[6] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[7] Monash Univ, Biomed Discovery Inst, Melbourne, Vic, Australia
[8] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[9] Univ Technol Sydney, Sch Software, Sydney, NSW, Australia
[10] Dalian Maritime Univ, Coll Sci, Dalian, Peoples R China
[11] Northwest A&F Univ, Coll Informat Engn, Yangling, Shaanxi, Peoples R China
[12] Monash Univ, Monash Data Futures Inst, Melbourne, Vic, Australia
[13] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[14] Univ Melbourne, Dept Microbiol & Immunol, Melbourne, Vic, Australia
[15] Univ Melbourne, Dept Clin Pathol, Melbourne, Vic, Australia
[16] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic, Australia
[17] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia
基金
澳大利亚国家健康与医学研究理事会; 美国国家卫生研究院; 澳大利亚研究理事会; 英国医学研究理事会;
关键词
positive unlabeled learning; semi-supervised learning; machine learning; bioinformatics; pattern recognition; PROTEIN FUNCTION; PREDICTION; INTEGRATION; SEQUENCE; SITES; PROMOTERS; NETWORKS;
D O I
10.1093/bib/bbab461
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] PURE: Positive-Unlabeled Recommendation with Generative Adversarial Network
    Zhou, Yao
    Xu, Jianpeng
    Wu, Jun
    Taghavi, Zeinab
    Korpeoglu, Evren
    Achan, Kannan
    He, Jingrui
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2409 - 2419
  • [42] Computational intelligence, bioinformatics and computational biology: A brief overview of methods, problems and perspectives
    Kasabov, N
    Sidorov, IA
    Dimitrov, DS
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2005, 2 (04) : 473 - 491
  • [43] Enhancing landslide susceptibility mapping using a positive-unlabeled machine learning approach: a case study in Chamoli, India
    Zhang, Danrong
    Jindal, Dipali
    Roy, Nimisha
    Vangla, Prashanth
    Frost, J. David
    GEOENVIRONMENTAL DISASTERS, 2024, 11 (01)
  • [44] Case-Related News Filtering via Topic-Enhanced Positive-Unlabeled Learning
    Wang, Guanwen
    Yu, Zhengtao
    Xian, Yantuan
    Zhang, Yu
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (06): : 1057 - 1070
  • [45] A Positive-Unlabeled Learning Model for Extending a Vietnamese Petroleum Dictionary Based on Vietnamese Wikipedia Data
    Ngoc-Trinh Vu
    Quoc-Dat Nguyen
    Tien-Dat Nguyen
    Manh-Cuong Nguyen
    Van-Vuong Vu
    Quang-Thuy Ha
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 : 190 - 199
  • [46] Recent Advances of Deep Learning in Bioinformatics and Computational Biology
    Tang, Binhua
    Pan, Zixiang
    Yin, Kang
    Khateeb, Asif
    FRONTIERS IN GENETICS, 2019, 10
  • [47] A Brief Review for Identifying Prokaryotic Promoters Based on Computational Biology
    Su W.
    Sun Z.
    Yue P.
    Lin H.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2021, 50 (05): : 667 - 675
  • [48] Diffusion models in bioinformatics and computational biology
    Guo, Zhiye
    Liu, Jian
    Wang, Yanli
    Chen, Mengrui
    Wang, Duolin
    Xu, Dong
    Cheng, Jianlin
    NATURE REVIEWS BIOENGINEERING, 2024, 2 (02): : 136 - 154
  • [49] ADR-DQPU: A Novel ADR Signal Detection Using Deep Reinforcement and Positive-Unlabeled Learning
    Chung, Chun-Kit
    Lin, Wen-Yang
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (02) : 831 - 839
  • [50] A multi-task positive-unlabeled learning framework to predict secreted proteins in human body fluids
    He, Kai
    Wang, Yan
    Xie, Xuping
    Shao, Dan
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (01) : 1319 - 1331