A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences

被引:35
作者
Bilgen, M [1 ]
Karaca, M [1 ]
Onus, AN [1 ]
Ince, AG [1 ]
机构
[1] Akdeniz Univ, Fac Agr, TR-07059 Antalya, Turkey
关键词
D O I
10.1093/bioinformatics/bth410
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: One of the most interesting features of genomes (both coding and non-coding regions) is the presence of relatively short tandemly repeated DNA sequences known as tandem repeats (TRs). We developed a new PC-based stand-alone software analysis program, combining sequence motif searches with keywords such as organs, tissues, cell lines or development stages for finding exact, inexact and compound, TRs. Tandem Repeats Analyzer 1.5 (TRA) has several advanced repeat search parameters/options over other repeat finder programs as it does not only accept GenBank, FASTA and expressed sequence tag (EST) sequence files but also does analysis of multifiles with multisequences. Advanced user-defined parameters/options let the researchers use different motif lengths search criteria for varying motif lengths simultaneously. The outputs show statistical results to be evaluated by the user. The discovery of TRs in ESTs could be useful for both gene mapping and association studies and discovering TRs located in coding regions of important genes that are expressed under various conditions of environment, stress, organ, tissue and development stage. Results: In this paper, we demonstrated applications of TRA using 175 899 ESTs sequences for three Arabidopsis spp. downloaded from GenBank. The EST-SSRs/ESTs ratios were found 43.1%, 15.3% and 2.34% in A.lyrata, A.thaliana and A.halleri, respectively. Analysis revealed that organs, tissues and development stages possessed different amounts of repeats and repeat compositions. This indicated that the distribution of TRs among the tissues or organs may not be random differing from the untranscribed repeats found in genomes.
引用
收藏
页码:3379 / 3386
页数:8
相关论文
共 30 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [3] Compound microsatellite repeats:: Practical and theoretical features
    Bull, LN
    Pabón-Peña, CR
    Freimer, NB
    [J]. GENOME RESEARCH, 1999, 9 (09) : 830 - 838
  • [4] Cardle L, 2000, GENETICS, V156, P847
  • [5] Maize simple repetitive DNA sequences: Abundance and allele variation
    Chin, ECL
    Senior, ML
    Shu, H
    Smith, JSC
    [J]. GENOME, 1996, 39 (05) : 866 - 873
  • [6] Tandemly repeated DNA sequences and centromeric chromosomal regions of Arabidopsis species
    Heslop-Harrison, JS
    [J]. CHROMOSOME RESEARCH, 2003, 11 (03) : 241 - 253
  • [7] HYPERVARIABLE MINISATELLITE REGIONS IN HUMAN DNA
    JEFFREYS, AJ
    WILSON, V
    THEIN, SL
    [J]. NATURE, 1985, 314 (6006) : 67 - 73
  • [8] Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat
    Kantety, RV
    La Rota, M
    Matthews, DE
    Sorrells, ME
    [J]. PLANT MOLECULAR BIOLOGY, 2002, 48 (05) : 501 - 510
  • [9] Simple sequence repeat (SSR) markers linked to the Ligon lintless (Li1) mutant in cotton
    Karaca, M
    Saha, S
    Jenkins, JN
    Zipf, A
    Kohel, R
    Stelly, DM
    [J]. JOURNAL OF HEREDITY, 2002, 93 (03) : 221 - 224
  • [10] Keniry MA, 2000, BIOPOLYMERS, V54, P104, DOI 10.1002/1097-0282(200008)54:2<104::AID-BIP3>3.0.CO