Compositional framework for multitask learning in the identification of cleavage sites of HIV-1 protease

被引:11
作者
Singh, Deepak [1 ]
Sisodia, Dilip Singh [1 ]
Singh, Pradeep [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, GE Rd Raipur, Raipur 492001, Chhattisgarh, India
关键词
HIV-1; protease; Multifactorial evolution; Multitask learning; Multiple Kernel learning; Protein encoding; MULTIFACTORIAL INHERITANCE; SUBNUCLEAR LOCALIZATION; CULTURAL TRANSMISSION; NEURAL-NETWORKS; PREDICTION; ENSEMBLE; CLASSIFIERS; SELECTION;
D O I
10.1016/j.jbi.2020.103376
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Inadequate patient samples and costly annotated data generations result into the smaller dataset in the biomedical domain. Due to which the predictions with a trained model that usually reveal a single small dataset association are fail to derive robust insights. To cope with the data sparsity, a promising strategy of combining data from the different related tasks is exercised in various application. Motivated by, successful work in the various bioinformatics application, we propose a multitask learning model based on multi-kernel that exploits the dependencies among various related tasks. This work aims to combine the knowledge from experimental studies of the different dataset to build stronger predictive models for HIV-1 protease cleavage sites prediction. In this study, a set of peptide data from one source is referred as 'task' and to integrate interactions from multiple tasks; our method exploits the common features and parameters sharing across the data source. The proposed framework uses feature integration, feature selection, multi-kernel and multifactorial evolutionary algorithm to model multitask learning. The framework considered seven different feature descriptors and four different kernel variants of support vector machines to form the optimal multi-kernel learning model. To validate the effectiveness of the model, the performance parameters such as average accuracy, and area under curve have been evaluated on the suggested model. We also carried out Friedman and post hoc statistical test to substantiate the significant improvement achieved by the proposed framework. The result obtained following the extensive experiment confirms the belief that multitask learning in cleavage site identification can improve the performance.
引用
收藏
页数:17
相关论文
共 78 条
  • [1] Ando RK, 2005, J MACH LEARN RES, V6, P1817
  • [2] [Anonymous], 2007, Multi-Task Feature Learning, DOI DOI 10.7551/MITPRESS/7503.003.0010
  • [3] [Anonymous], MACHINE LEARNING HEA
  • [4] Convex multi-task feature learning
    Argyriou, Andreas
    Evgeniou, Theodoros
    Pontil, Massimiliano
    [J]. MACHINE LEARNING, 2008, 73 (03) : 243 - 272
  • [5] Prediction of protease substrates using sequence and structure features
    Barkan, David T.
    Hostetter, Daniel R.
    Mahrus, Sami
    Pieper, Ursula
    Wells, James A.
    Craik, Charles S.
    Sali, Andrej
    [J]. BIOINFORMATICS, 2010, 26 (14) : 1714 - 1722
  • [6] Benavoli A, 2016, J MACH LEARN RES, V17
  • [7] A survey of machine learning applications in HIV clinical research and care
    Bisaso, Kuteesa R.
    Anguzu, Godwin T.
    Karungi, Susan A.
    Kiragga, Agnes
    Castelnuovo, Barbara
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 91 : 366 - 371
  • [8] Blake C. L., 1998, UCI Repository of Machine Learning Databases
  • [9] Caruana R, 1998, LEARNING TO LEARN, P95, DOI 10.1007/978-1-4615-5529-2_5
  • [10] Multitask learning
    Caruana, R
    [J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75