Combing ontologies and dipeptide composition for predicting DNA-binding proteins

被引:29
作者
Nanni, Loris [1 ]
Lumini, Alessandra [1 ]
机构
[1] Univ Bologna, DEIS, CNR, IEIIT, I-40136 Bologna, Italy
关键词
DNA-binding proteins; gene ontology; dipeptide composition; Chou's pseudo amino acid composition; multi-classifier;
D O I
10.1007/s00726-007-0016-3
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Given a novel protein it is very important to know if it is a DNA-binding protein, because DNA-binding proteins participate in the fundamental role to regulate gene expression. In this work, we propose a parallel fusion between a classifier trained using the features extracted from the gene ontology database and a classifier trained using the dipeptide composition of the protein. As classifiers the support vector machine (SVM) and the 1-nearest neighbour are used. Matthews's correlation coefficient obtained by our fusion method is approximate to 0.97 when the jackknife cross-validation is used; this result outperforms the best performance obtained in the literature (0.924) using the same dataset where the SVM is trained using only the Chou's pseudo amino acid based features. In this work also the area under the ROC-curve (AUC) is reported and our results show that the fusion permits to obtain a very interesting 0.995 AUC. In particular we want to stress that our fusion obtains a 5% false negative with a 0% of false positive. Matthews's correlation coefficient obtained using the single best GO-number is only 0.7211 and hence it is not possible to use the gene ontology database as a simple lookup table. Finally, we test the complementarity of the two tested feature extraction methods using the Q-statistic. We obtain the very interesting result of 0.58, which means that the features extracted from the gene ontology database and the features extracted from the amino acid sequence are partially independent and that their parallel fusion should be studied more.
引用
收藏
页码:635 / 641
页数:7
相关论文
共 50 条
  • [31] Identification and Characterization of Methylated DNA-Binding Proteins in Xenopus Egg Extracts
    Chakraborty, Sangita
    Shih, Rochelle
    Funabiki, Hironori
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2024, 300 (03) : S686 - S686
  • [32] Identification of DNA-binding Proteins Using Structural, Electrostatic and Evolutionary Features
    Nimrod, Guy
    Szilagyi, Andras
    Leslie, Christina
    Ben-Tal, Nir
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2009, 387 (04) : 1040 - 1053
  • [33] Predicting a DNA-binding protein using random forest with multiple mathematical features
    Guan, Changge
    Niu, Xiaohui
    Shi, Feng
    Yang, Kun
    Li, Nana
    [J]. BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1883 - S1889
  • [34] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Qian, Yuqing
    Jiang, Limin
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    [J]. BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
  • [35] DNA-binding proteins of the malaria vector Anopheles stephensi:: Purification and characterization of an endonuclease
    Gakhar, SK
    Singh, S
    Shandilya, H
    [J]. ARCHIVES OF INSECT BIOCHEMISTRY AND PHYSIOLOGY, 2000, 44 (01) : 40 - 46
  • [36] Multidimensional NMR spectroscopy of DNA-binding proteins: Structure and function of a transcription factor
    Hsu, VL
    Jia, X
    Kearns, DR
    [J]. TOXICOLOGY LETTERS, 1995, 82-3 : 577 - 589
  • [37] Oligonucleotide-based PROTACs to Degrade RNA- and DNA-Binding Proteins
    Weller, Celine N.
    Hall, Jonathan
    [J]. CHIMIA, 2025, 79 (03) : 167 - 171
  • [38] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Yuqing Qian
    Limin Jiang
    Yijie Ding
    Jijun Tang
    Fei Guo
    [J]. BMC Bioinformatics, 22
  • [39] XENOPUS INTERSPERSED RNA FAMILIES, OCR AND XR, BIND DNA-BINDING PROTEINS
    GUTTRIDGE, KL
    SMITH, LD
    [J]. ZYGOTE, 1995, 3 (02) : 111 - 122
  • [40] Characterization of novel DNA-binding proteins expressed in snake oocyte cDNA library
    Ganesan, Mala
    Paithankar, Khanderao R.
    Jagannadham, Medicharla V.
    Sundaram, Curam S.
    Murthy, Bulusu S.
    Singh, Laiji
    [J]. PROTEIN EXPRESSION AND PURIFICATION, 2007, 53 (01) : 164 - 178