Statistical Significance of Optical Map Alignments

被引:16
作者
Sarkar, Deepayan [1 ]
Goldstein, Steve [2 ]
Schwartz, David C. [2 ,3 ,4 ]
Newton, Michael A. [5 ,6 ]
机构
[1] Indian Stat Inst, Theoret Stat & Math Unit, New Delhi 110016, India
[2] Univ Wisconsin, Lab Mol & Computat Genom, Madison, WI USA
[3] Univ Wisconsin, Dept Chem, Genet Lab, Madison, WI 53706 USA
[4] Univ Wisconsin, Ctr Biotechnol, Madison, WI 53705 USA
[5] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[6] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI USA
基金
美国国家卫生研究院;
关键词
conditional inference; permutation; single molecule; structural variation; STRUCTURAL VARIATION; HUMAN GENOME; SEQUENCE; POLYMORPHISM;
D O I
10.1089/cmb.2011.0221
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities.
引用
收藏
页码:478 / 492
页数:15
相关论文
共 24 条
  • [1] AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE
    ALTSCHUL, SF
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) : 555 - 565
  • [2] Anantharaman T, 1999, Proc Int Conf Intell Syst Mol Biol, P18
  • [3] [Anonymous], P 8 INT C DISTR COMP
  • [4] [Anonymous], 1979, D. V. Theoretical Statistics
  • [5] A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk
    Antonacci, Francesca
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Teague, Brian
    Ventura, Mario
    Girirajan, Santhosh
    Alkan, Can
    Campbell, Catarina D.
    Vives, Laura
    Malig, Maika
    Rosenfeld, Jill A.
    Ballif, Blake C.
    Shaffer, Lisa G.
    Graves, Tina A.
    Wilson, Richard K.
    Schwartz, David C.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2010, 42 (09) : 745 - U29
  • [6] SCATTERPLOT MATRIX TECHNIQUES FOR LARGE-N
    CARR, DB
    LITTLEFIELD, RJ
    NICHOLSON, WL
    LITTLEFIELD, JS
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1987, 82 (398) : 424 - 436
  • [7] A microfluidic system for large DNA molecule arrays
    Dimalanta, ET
    Lim, A
    Runnheim, R
    Lamers, C
    Churas, C
    Forrest, DK
    de Pablo, JJ
    Graham, MD
    Coppersmith, SN
    Goldstein, S
    Schwartz, DC
    [J]. ANALYTICAL CHEMISTRY, 2004, 76 (18) : 5293 - 5301
  • [8] SIMILAR AMINO-ACID-SEQUENCES - CHANCE OR COMMON ANCESTRY
    DOOLITTLE, RF
    [J]. SCIENCE, 1981, 214 (4517) : 149 - 159
  • [9] HUANG XQ, 1992, COMPUT APPL BIOSCI, V8, P511
  • [10] METHODS FOR ASSESSING THE STATISTICAL SIGNIFICANCE OF MOLECULAR SEQUENCE FEATURES BY USING GENERAL SCORING SCHEMES
    KARLIN, S
    ALTSCHUL, SF
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (06) : 2264 - 2268