Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model

被引:50
作者
Akbar, Shahid [1 ,2 ]
Raza, Ali [3 ]
Zou, Quan [1 ,4 ]
机构
[1] Univ Elect Sci & Technol China, Inst Fundamental & Frontier Sci, Chengdu 610054, Peoples R China
[2] Abdul Wali Khan Univ Mardan, Dept Comp Sci, Mardan 23200, KP, Pakistan
[3] Qurtuba Univ Sci & Informat Technol, Dept Phys & Numer Sci, Peshawar 25124, KP, Pakistan
[4] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Quzhou, Quzhou 324000, Peoples R China
基金
中国国家自然科学基金;
关键词
Antiviral peptides; Prediction; Tri-segmentation based evolutionary features; Word embedding; Feature selection; Stacked ensemble model; MACHINE LEARNING TECHNIQUES; PROTEINS;
D O I
10.1186/s12859-024-05726-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. Methods: In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. Results: The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. Conclusion: Our Deepstacked-AVPs model outperformed existing models with a similar to 4% and similar to 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
引用
收藏
页数:16
相关论文
共 52 条
  • [1] An expanded sequence context model broadly explains variability in polymorphism levels across the human genome
    Aggarwala, Varun
    Voight, Benjamin F.
    [J]. NATURE GENETICS, 2016, 48 (04) : 349 - +
  • [2] iAFPs-EnC-GA: Identifying antifungal peptides using sequential and evolutionary descriptors based multi-information fusion and ensemble learning approach
    Ahmad, Ashfaq
    Akbar, Shahid
    Tahir, Muhammad
    Hayat, Maqsood
    Ali, Farman
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 222
  • [3] SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins
    Ahmad, Saeed
    Charoenkwan, Phasit
    Quinn, Julian M. W.
    Moni, Mohammad Ali
    Hasan, Md Mehedi
    Lio, Pietro
    Shoombuatong, Watshara
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01):
  • [4] PredAoDP: Accurate identification of antioxidant proteins by fusing different descriptors based on evolutionary information with support vector machine
    Ahmed, Saeed
    Arif, Muhammad
    Kabir, Muhammad
    Khan, Khaistah
    Khan, Yaser Daanial
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 228
  • [5] pAtbP-EnC: Identifying Anti-Tubercular Peptides Using Multi-Feature Representation and Genetic Algorithm-Based Deep Ensemble Model
    Akbar, Shahid
    Raza, Ali
    Al Shloul, Tamara
    Ahmad, Ashfaq
    Saeed, Aamir
    Ghadi, Yazeed Yasin
    Mamyrbayev, Orken
    Tag-Eldin, Elsayed
    [J]. IEEE ACCESS, 2023, 11 : 137099 - 137114
  • [6] Prediction of Antiviral peptides using transform evolutionary & SHAP analysis based descriptors by incorporation with ensemble learning strategy
    Akbar, Shahid
    Ali, Farman
    Hayat, Maqsood
    Ahmad, Ashfaq
    Khan, Salman
    Gul, Sarah
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 230
  • [7] cACP-DeepGram: Classification of anticancer peptides via deep neural network and skip-gram-based word embedding model
    Akbar, Shahid
    Hayat, Maqsood
    Tahir, Muhammad
    Khan, Salman
    Alarfaj, Fawaz Khaled
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2022, 131
  • [8] iHBP-DeepPSSM: Identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach
    Akbar, Shahid
    Khan, Salman
    Ali, Farman
    Hayat, Maqsood
    Qasim, Muhammad
    Gul, Sarah
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 204 (204)
  • [9] iMethyl-STTNC: Identification of N6-methyladenosine sites by extending the idea of SAAC into Chou's PseAAC to formulate RNA sequences
    Akbar, Shahid
    Hayat, Maqsood
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2018, 455 : 205 - 211
  • [10] AFP-CMBPred: Computational identification of antifreeze proteins by extending consensus sequences into multi-blocks evolutionary information
    Ali, Farman
    Akbar, Shahid
    Ghulam, Ali
    Maher, Zulfikar Ahmed
    Unar, Ahsanullah
    Talpur, Dhani Bux
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 139