Enrichment of regulatory signals in conserved non-coding genomic sequence

被引:122
作者
Levy, S
Hannenhalli, S
Workman, C
机构
[1] Celera Genom Corp, Informat Res, Rockville, MD 20850 USA
[2] Tech Univ Denmark, Ctr Biol Sequence Anal, DK-2800 Lyngby, Denmark
关键词
D O I
10.1093/bioinformatics/17.10.871
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Whole genome shotgun sequencing strategies generate sequence data prior to the application of assembly methodologies that result in contiguous sequence. Sequence reads can be employed to indicate regions of conservation between closely related species for which only one genome has been assembled. Consequently, by using pairwise sequence alignments methods it is possible to identify novel, non-repetitive, conserved segments in non-coding sequence that exist between the assembled human genome and mouse whole genome shotgun sequencing fragments. Conserved non-coding regions identify potentially functional DNA that could be involved in transcriptional regulation. Results: Local sequence alignment methods were applied employing mouse fragments and the assembled human genome. In addition, transcription factor binding sites were detected by aligning their corresponding positional weight matrices to the sequence regions. These methods were applied to a set of transcripts corresponding to 502 genes associated with a variety of different human diseases taken from the Online Mendelian Inheritance in Man database. Using statistical arguments we have shown that conserved non-coding segments contain an enrichment of transcription factor binding sites when compared to the sequence background in which the conserved segments are located. This enrichment of binding sites was not observed in coding sequence. Conserved non-coding segments are not extensively repeated in the genome and therefore their identification provides a rapid means of finding genes with related conserved regions, and consequently potentially related regulatory mechanism. Conserved segments in upstream regions are found to contain binding sites that are co-localized in a manner consistent with experimentally known transcription factor pairwise co-occurrences and afford the identification of novel co-occurring Transcription Factor (TF) pairs. This study provides a methodology and more evidence to suggest that conserved non-coding regions are biologically significant since they contain a statistical enrichment of regulatory signals and pairs of signals that enable the construction of regulatory models for human genes.
引用
收藏
页码:871 / 877
页数:7
相关论文
共 23 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Bailey T L, 1995, Proc Int Conf Intell Syst Mol Biol, V3, P21
  • [3] Shotgun sample sequence comparisons between mouse and human genomes
    Bouck, JB
    Metzker, ML
    Gibbs, RA
    [J]. NATURE GENETICS, 2000, 25 (01) : 31 - 33
  • [4] Regulatory elements and expression profiles
    Bucher, P
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1999, 9 (03) : 400 - 407
  • [5] Active conservation of noncoding sequences revealed by three-way species comparisons
    Dubchak, I
    Brudno, M
    Loots, GG
    Pachter, L
    Mayor, C
    Rubin, EM
    Frazer, KA
    [J]. GENOME RESEARCH, 2000, 10 (09) : 1304 - 1306
  • [6] Searching for regulatory elements in human noncoding sequences
    Duret, L
    Bucher, P
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (03) : 399 - 406
  • [7] Eukaryotic promoter recognition
    Fickett, JW
    Hatzigeorgiou, AC
    [J]. GENOME RESEARCH, 1997, 7 (09) : 861 - 878
  • [8] Evolutionary strategies for the elucidation of cis and trans factors that regulate the developmental switching programs of the beta-like globin genes
    Gumucio, DL
    Shelton, DA
    Zhu, W
    Millinoff, D
    Gray, T
    Bock, JH
    Slightom, JL
    Goodman, M
    [J]. MOLECULAR PHYLOGENETICS AND EVOLUTION, 1996, 5 (01) : 18 - 32
  • [9] Alu-mediated phylogenetic novelties in gene regulation and development
    Hamdi, HK
    Nishio, H
    Tavis, J
    Zielinski, R
    Dugaiczyk, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 299 (04) : 931 - 939
  • [10] Long human-mouse sequence alignments reveal novel regulatory elements: A reason to sequence the mouse genome
    Hardison, RC
    Oeltjen, J
    Miller, W
    [J]. GENOME RESEARCH, 1997, 7 (10): : 959 - 966