Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices

被引:81
作者
Munteanu, Cristian Robert [2 ]
Magalhaes, Alexandre L. [2 ]
Uriarte, Eugenio [1 ,3 ]
Gonzalez-Diaz, Humberto [1 ,3 ]
机构
[1] Univ Santiago Compostela, Fac Pharm, Inst Ind Pharm, UBICA, Santiago De Compostela 15782, Spain
[2] Univ Porto, Dept Chem, Fac Sci, REQUIMTE, P-4169007 Oporto, Portugal
[3] Univ Santiago Compostela, Fac Pharm, Dept Organ Chem, Santiago De Compostela 15782, Spain
关键词
Input-coded multi-target QPDR; Star graph; Cancer theoretical model; Clinical proteomics; GDA method; AMINO-ACID-COMPOSITION; COAGULATION-FACTOR-XIII; SUPPORT VECTOR MACHINE; DNA PRIMARY SEQUENCES; SUBCELLULAR LOCATION; CELLULAR-AUTOMATA; CODON USAGE; QSAR MODEL; PREDICTION; REPRESENTATION;
D O I
10.1016/j.jtbi.2008.11.017
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The cancer diagnostic is a complex process and, sometimes, the specific markers can interfere or produce negative results. Thus, new simple and fast theoretical models are required. One option is the complex network graphs theory that permits us to describe any real system, from the small molecules to the complex genetic, neural or social networks by transforming real properties in topological indices. This work converts the protein primary structure data in specific Randic's star networks topological indices using the new sequence to star networks (S2SNet) application. A set of 1054 proteins were selected from previous works and contains proteins related or not with two types of cancer, human breast cancer (HBC) and human colon cancer (HCC). The general discriminant analysis method generates an input-coded multi-target classification model with the training/predicting set accuracies of 90.0% for the forward stepwise model type. In addition, a protein subset was modified by single amino acid mutations with higher log-odds PAM250 values and tested with the new classification if can be related with HBC or HCC. In conclusion, we shown that, using simple input data such is the primary protein sequence and the simples linear analysis, it is possible to obtain accurate classification models that can predict if a new protein related with two types of cancer. These results promote the use of the S2SNet in clinical proteomics. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:303 / 311
页数:9
相关论文
共 76 条
[51]   Unified QSAR approach to antimicrobials.: Part 3:: First multi-tasking QSAR model for Input-Coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds [J].
Prado-Prado, Francisco J. ;
Gonzalez-Diaz, Humberto ;
Martinez de la Vega, Octavio ;
Ubeira, Florencio M. ;
Chou, Kuo-Chen .
BIOORGANIC & MEDICINAL CHEMISTRY, 2008, 16 (11) :5871-5880
[52]   New 3D graphical representation of DNA sequence based on dual nucleotides [J].
Qi, Xiao-Qin ;
Wen, Jie ;
Qi, Zhao-Hui .
JOURNAL OF THEORETICAL BIOLOGY, 2007, 249 (04) :681-690
[53]   Characterization of DNA primary sequences based on the average distances between bases [J].
Randic, M ;
Basak, SC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (03) :561-568
[54]   On a four-dimensional representation of DNA primary sequences [J].
Randic, M ;
Balaban, AT .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :532-539
[55]   On 3-D graphical representation of DNA primary sequences and their numerical characterization [J].
Randic, M ;
Vracko, M ;
Nandy, A ;
Basak, SC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (05) :1235-1244
[56]   Condensed representation of DNA primary sequences [J].
Randic, M .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (01) :50-56
[57]   On representation of proteins by star-like graphs [J].
Randic, Milan ;
Zupan, Jure ;
Vikic-Topic, Drazen .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2007, 26 (01) :290-305
[58]  
Rappin N., 2006, WXPYTHON ACTION
[59]   The consensus coding sequences of human breast and colorectal cancers [J].
Sjoeblom, Tobias ;
Jones, Sian ;
Wood, Laura D. ;
Parsons, D. Williams ;
Lin, Jimmy ;
Barber, Thomas D. ;
Mandelker, Diana ;
Leary, Rebecca J. ;
Ptak, Janine ;
Silliman, Natalie ;
Szabo, Steve ;
Buckhaults, Phillip ;
Farrell, Christopher ;
Meeh, Paul ;
Markowitz, Sanford D. ;
Willis, Joseph ;
Dawson, Dawn ;
Willson, James K. V. ;
Gazdar, Adi F. ;
Hartigan, James ;
Wu, Leo ;
Liu, Changsheng ;
Parmigiani, Giovanni ;
Park, Ben Ho ;
Bachman, Kurtis E. ;
Papadopoulos, Nickolas ;
Vogelstein, Bert ;
Kinzler, Kenneth W. ;
Velculescu, Victor E. .
SCIENCE, 2006, 314 (5797) :268-274
[60]  
StatSoft Inc, 2002, STATISTICA DAT AN SO