Circular effects in representations of an RNA nucleotides data set in relation with principal components analysis

被引:13
作者
Reijmers, TH [1 ]
Wehrens, R [1 ]
Buydens, LMC [1 ]
机构
[1] Catholic Univ Nijmegen, Analyt Chem Lab, NL-6525 ED Nijmegen, Netherlands
关键词
PCA; data mining; RNA nucleotides; multivariate analysis;
D O I
10.1016/S0169-7439(01)00109-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
During the last few years, the main reason for using molecular structure databases has changed. instead of using databases as a storage medium, databases now are also used as a source for data-mining applications. The large number of objects and variables in these databases induced that besides univariate techniques, multivariate techniques are also applied to search for knowledge hidden in the data. A popular multivariate technique that is used to explore the underlying structure in data is called principal component analysis (PCA). Because structure data are often represented as torsion angles and PCA is not originally designed to deal with this kind of circular data, the outcome of PCA experiments can be misleading. This article describes several alternative representations of circular data and its effect on the outcome of PCA experiments. A worked example is given using a database of RNA nucleotides. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:61 / 71
页数:11
相关论文
共 26 条
  • [1] [Anonymous], 2007, Biostatistical analysis
  • [2] 1998, HDB CHEMOMETRICS QUA, V11, P310
  • [3] Beckers MLM, 1998, J COMPUT CHEM, V19, P695, DOI 10.1002/(SICI)1096-987X(199805)19:7<695::AID-JCC2>3.0.CO
  • [4] 2-L
  • [5] Predicting nucleic acid torsion angle values using artificial neural networks
    Beckers, MLM
    Melssen, WJ
    Buydens, LMC
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1998, 12 (01) : 53 - 61
  • [6] Berman HM, 1997, BIOPOLYMERS, V44, P23, DOI 10.1002/(SICI)1097-0282(1997)44:1<23::AID-BIP3>3.0.CO
  • [7] 2-1
  • [8] THE NUCLEIC-ACID DATABASE - A COMPREHENSIVE RELATIONAL DATABASE OF 3-DIMENSIONAL STRUCTURES OF NUCLEIC-ACIDS
    BERMAN, HM
    OLSON, WK
    BEVERIDGE, DL
    WESTBROOK, J
    GELBIN, A
    DEMENY, T
    HSIEH, SH
    SRINIVASAN, AR
    SCHNEIDER, B
    [J]. BIOPHYSICAL JOURNAL, 1992, 63 (03) : 751 - 759
  • [9] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) : 535 - 542
  • [10] Molecular data-mining: a challenge for chemometrics
    Buydens, LMC
    Reijmers, TH
    Beckers, MLM
    Wehrens, R
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1999, 49 (02) : 121 - 133