Accurate anchoring alignment of divergent sequences

被引:26
作者
Huang, WC
Umbach, DM
Li, LP [1 ]
机构
[1] Natl Inst Environm Hlth Sci, Biostat Branch, Res Triangle Pk, NC 27709 USA
[2] N Carolina State Univ, Bioinformat Res Ctr, Raleigh, NC 27606 USA
[3] Duke Univ, Ctr Med, Inst Genome Sci & Policy, Durham, NC 27708 USA
关键词
D O I
10.1093/bioinformatics/bti772
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Obtaining high quality alignments of divergent homologous sequences for cross-species sequence comparison remains a challenge. Results: We propose a novel pairwise sequence alignment algorithm, ACANA (ACcurate ANchoring Alignment), for aligning biological sequences at both local and global levels. Like many fast heuristic methods, ACANA uses an anchoring strategy. However, unlike others, ACANA uses a Smith-Waterman-like dynamic programming algorithm to recursively identify near-optimal regions as anchors for a global alignment. Performance evaluations using a simulated benchmark dataset and real promoter sequences suggest that ACANA is accurate and consistent, especially for divergent sequences. Specifically, we use a simulated benchmark dataset to show that ACANA has the highest sensitivity to align constrained functional sites compared to BLASTZ, CHAOS and DIALIGN for local alignment and compared to AVID, ClustalW, DIALIGN and LAGAN for global alignment. Applied to 6007 pairs of human-mouse orthologous promoter sequences, ACANA identified the largest number of conserved regions (defined as over 70% identity over 100 bp) compared to AVID, ClustalW, DIALIGN and LAGAN. In addition, the average length of conserved region identified by ACANA was the longest. Thus, we suggest that ACANA is a useful tool for identifying functional elements in cross-species sequence analysis, such as predicting transcription factor binding sites in non-coding DNA.
引用
收藏
页码:29 / 34
页数:6
相关论文
共 36 条
  • [1] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [4] BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations
    Bahr, A
    Thompson, JD
    Thierry, JC
    Poch, O
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 323 - 326
  • [5] BARTON GJ, 1993, COMPUT APPL BIOSCI, V9, P729
  • [6] Human and mouse gene structure: Comparative analysis and application to exon prediction
    Batzoglou, S
    Pachter, L
    Mesirov, JP
    Berger, B
    Lander, ES
    [J]. GENOME RESEARCH, 2000, 10 (07) : 950 - 958
  • [7] The many faces of sequence alignment
    Batzoglou, S
    [J]. BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) : 6 - 22
  • [8] AVID: A global alignment program
    Bray, N
    Dubchak, I
    Pachter, L
    [J]. GENOME RESEARCH, 2003, 13 (01) : 97 - 102
  • [9] Fast and sensitive multiple alignment of large genomic sequences -: art. no. 66
    Brudno, M
    Chapman, M
    Göttgens, B
    Batzoglou, S
    Morgenstern, B
    [J]. BMC BIOINFORMATICS, 2003, 4 (1)
  • [10] LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA
    Brudno, M
    Do, CB
    Cooper, GM
    Kim, MF
    Davydov, E
    Green, ED
    Sidow, A
    Batzoglou, S
    [J]. GENOME RESEARCH, 2003, 13 (04) : 721 - 731