Identification and analysis of transcription factor family-specific features derived from DNA and protein information

被引:3
作者
Anand, Ashish [1 ]
Pugalenthi, Ganesan [1 ]
Fogel, Gary B. [2 ]
Suganthan, P. N. [1 ]
机构
[1] Nanyang Technol Univ, Sch EEE, Singapore 639798, Singapore
[2] Nat Select Inc, San Diego, CA 92121 USA
关键词
Transcription factor; TF family-specific features; TF-TFBS interaction; Multi-class classification; TFBS; Feature selection; STRUCTURAL CLASS; BINDING MOTIF; RECOGNITION; GENE; CLASSIFICATION; EXPRESSION; PREDICTION; DATABASE; DOMAIN; SELECTION;
D O I
10.1016/j.patrec.2009.10.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common approach for understanding the relationship between transcription factors (TFs) and transcription factor binding sites (TFBSs) is to use features at either the TF level or the DNA level. For a given TF family, features can be derived from the DNA-binding domains at the protein level as well as TF binding sites at the DNA sequence level. Here we investigate the relative importance of features from these different levels for main TF families to better understand: (1) family-specific features and (2) the proportion of features from either the DNA or protein level. We perform class-wise feature selection on IF families to identify important features for each family. Importance of the selected features is assessed in terms of predictive accuracy of assigning TFs and associated TFBSs to correct TF families. Evaluation of the best model on an independent test set resulted in a predictive accuracy of similar to 90%. Analysis of the selected features used in the best model on a family-by-family basis shows congruence with the fact that interaction between TF proteins and TFBS in the DNA is quite family specific. Our analysis further suggests that: (1) this approach can be used to determine and better understand which features (at both the DNA and protein levels) are important to consider for each TF family, and (2) a similar approach to combine DNA and protein level features may be useful for other datasets where protein-DNA interaction is a key component of biological function. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:2097 / 2102
页数:6
相关论文
共 39 条
[31]   Multiclass cancer diagnosis using tumor gene expression signatures [J].
Ramaswamy, S ;
Tamayo, P ;
Rifkin, R ;
Mukherjee, S ;
Yeang, CH ;
Angelo, M ;
Ladd, C ;
Reich, M ;
Latulippe, E ;
Mesirov, JP ;
Poggio, T ;
Gerald, W ;
Loda, M ;
Lander, ES ;
Golub, TR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (26) :15149-15154
[32]   An analytical method for multiclass molecular cancer classification [J].
Rifkin, R ;
Mukherjee, S ;
Tamayo, P ;
Ramaswamy, S ;
Yeang, CH ;
Angelo, M ;
Reich, M ;
Poggio, T ;
Lander, ES ;
Golub, TR ;
Mesirov, JP .
SIAM REVIEW, 2003, 45 (04) :706-723
[33]  
Vapnik V., 1998, Statistical Learning Theory, P5
[34]  
VLIEGHE D, 2006, NUCLEIC ACIDS RES, V95, pD97
[35]   THE FORK HEAD DOMAIN - A NOVEL DNA-BINDING MOTIF OF EUKARYOTIC TRANSCRIPTION FACTORS [J].
WEIGEL, D ;
JACKLE, H .
CELL, 1990, 63 (03) :455-456
[36]   THE SOLUTION STRUCTURE OF THE HUMAN ETS1-DNA COMPLEX REVEALS A NOVEL MODE OF BINDING AND TRUE SIDE-CHAIN INTERCALATION [J].
WERNER, MH ;
CLORE, GM ;
FISHER, CL ;
FISHER, RJ ;
TRINH, L ;
SHILOACH, J ;
GRONENBORN, AM .
CELL, 1995, 83 (05) :761-771
[37]  
Weston J., 1999, ARTIFICIAL NEURAL NE
[38]   DNA recognition by Cys2His2 zinc finger proteins [J].
Wolfe, SA ;
Nekludova, L ;
Pabo, CO .
ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE, 2000, 29 :183-212
[39]   STRUCTURAL DETERMINANTS OF DNA-BINDING SPECIFICITY BY STEROID-RECEPTORS [J].
ZILLIACUS, J ;
WRIGHT, APH ;
CARLSTEDTDUKE, J ;
GUSTAFSSON, JA .
MOLECULAR ENDOCRINOLOGY, 1995, 9 (04) :389-400