Detecting DNA Modifications from SMRT Sequencing Data by Modeling Sequence Context Dependence of Polymerase Kinetic

被引:46
作者
Feng, Zhixing [1 ,2 ,3 ]
Fang, Gang [4 ]
Korlach, Jonas [5 ]
Clark, Tyson [5 ]
Khai Luong [5 ]
Zhang, Xuegong [1 ,2 ]
Wong, Wing [3 ]
Schadt, Eric [5 ,6 ]
机构
[1] Tsinghua Univ, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[3] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[4] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN USA
[5] Pacific Biosci, Menlo Pk, CA USA
[6] Mt Sinai Sch Med, Dept Genet & Genom Sci, New York, NY USA
关键词
SINGLE-MOLECULE;
D O I
10.1371/journal.pcbi.1002935
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT) sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data) covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch.
引用
收藏
页数:10
相关论文
共 10 条
[1]  
[Anonymous], 2003, Bayesian Data Analysis
[2]   PRE-STEADY-STATE KINETIC-ANALYSIS OF SEQUENCE-DEPENDENT NUCLEOTIDE EXCISION BY THE 3'-EXONUCLEASE ACTIVITY OF BACTERIOPHAGE-T4 DNA-POLYMERASE [J].
BLOOM, LB ;
OTTO, MR ;
ERITJA, R ;
REHAKRANTZ, LJ ;
GOODMAN, MF ;
BEECHEM, JM .
BIOCHEMISTRY, 1994, 33 (24) :7576-7586
[3]   Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing [J].
Clark, Tyson A. ;
Murray, Iain A. ;
Morgan, Richard D. ;
Kislyuk, Andrey O. ;
Spittle, Kristi E. ;
Boitano, Matthew ;
Fomenkov, Alexey ;
Roberts, Richard J. ;
Korlach, Jonas .
NUCLEIC ACIDS RESEARCH, 2012, 40 (04) :e29
[4]  
Clark Tyson A, 2011, Genome Integr, V2, P10, DOI 10.1186/2041-9414-2-10
[5]   Real-Time DNA Sequencing from Single Polymerase Molecules [J].
Eid, John ;
Fehr, Adrian ;
Gray, Jeremy ;
Luong, Khai ;
Lyle, John ;
Otto, Geoff ;
Peluso, Paul ;
Rank, David ;
Baybayan, Primo ;
Bettman, Brad ;
Bibillo, Arkadiusz ;
Bjornson, Keith ;
Chaudhuri, Bidhan ;
Christians, Frederick ;
Cicero, Ronald ;
Clark, Sonya ;
Dalal, Ravindra ;
deWinter, Alex ;
Dixon, John ;
Foquet, Mathieu ;
Gaertner, Alfred ;
Hardenbol, Paul ;
Heiner, Cheryl ;
Hester, Kevin ;
Holden, David ;
Kearns, Gregory ;
Kong, Xiangxu ;
Kuse, Ronald ;
Lacroix, Yves ;
Lin, Steven ;
Lundquist, Paul ;
Ma, Congcong ;
Marks, Patrick ;
Maxham, Mark ;
Murphy, Devon ;
Park, Insil ;
Pham, Thang ;
Phillips, Michael ;
Roy, Joy ;
Sebra, Robert ;
Shen, Gene ;
Sorenson, Jon ;
Tomaney, Austin ;
Travers, Kevin ;
Trulson, Mark ;
Vieceli, John ;
Wegener, Jeffrey ;
Wu, Dawn ;
Yang, Alicia ;
Zaccarin, Denis .
SCIENCE, 2009, 323 (5910) :133-138
[6]  
Flusberg BA, 2010, NAT METHODS, V7, P461, DOI [10.1038/NMETH.1459, 10.1038/nmeth.1459]
[7]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[8]  
GEIER GE, 1979, J BIOL CHEM, V254, P1408
[9]   Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases [J].
Schadt, Eric E. ;
Banerjee, Onureena ;
Fang, Gang ;
Feng, Zhixing ;
Wong, Wing H. ;
Zhang, Xuegong ;
Kislyuk, Andrey ;
Clark, Tyson A. ;
Khai Luong ;
Keren-Paz, Alona ;
Chess, Andrew ;
Kumar, Vipin ;
Chen-Plotkin, Alice ;
Sondheimer, Neal ;
Korlach, Jonas ;
Kasarskis, Andrew .
GENOME RESEARCH, 2013, 23 (01) :129-141
[10]   A window into third-generation sequencing [J].
Schadt, Eric E. ;
Turner, Steve ;
Kasarskis, Andrew .
HUMAN MOLECULAR GENETICS, 2010, 19 :R227-R240