Detection of dispersed short tandem repeats using reversible jump Markov chain Monte Carlo

被引:0
作者
Liang, Tong [2 ]
Fan, Xiaodan [1 ]
Li, Qiwei [1 ]
Li, Shuo-yen R. [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Stat, Shatin, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Dept Informat Engn, Shatin, Hong Kong, Peoples R China
关键词
VARIABLE-NUMBER; DNA; SEQUENCES; GENOMES; DISCOVERY; MODEL; GENE; MICROSATELLITES; MUTATION; IDENTIFICATION;
D O I
10.1093/nar/gks644
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats.
引用
收藏
页数:8
相关论文
共 49 条
[1]   Improving the acceptance rate of reversible jump MCMC proposals [J].
Al-Awadhi, F ;
Hurn, M ;
Jennison, C .
STATISTICS & PROBABILITY LETTERS, 2004, 69 (02) :189-198
[2]  
[Anonymous], 2021, Bayesian data analysis
[3]  
Bailey T., 1994, P 2 INT C INT SYST M, V1, P28
[4]   Automated de novo identification of repeat sequence families in sequenced genomes [J].
Bao, ZR ;
Eddy, SR .
GENOME RESEARCH, 2002, 12 (08) :1269-1276
[5]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[6]   Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions [J].
Brooks, SP ;
Giudici, P ;
Roberts, GO .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2003, 65 :3-39
[7]   Model selection for variable length Markov chains and tuning the context algorithm [J].
Bühlmann, P .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2000, 52 (02) :287-315
[8]   Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers [J].
Cappé, O ;
Robert, CP ;
Rydén, T .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2003, 65 :679-700
[9]   Analysis of immunoglobulin S gamma 3 recombination breakpoints by PCR: implications for the mechanism of isotype switching [J].
Du, J ;
Zhu, Y ;
Shanmugam, A ;
Kenter, AL .
NUCLEIC ACIDS RESEARCH, 1997, 25 (15) :3066-3073
[10]   Microsatellites: Simple sequences with complex evolution [J].
Ellegren, H .
NATURE REVIEWS GENETICS, 2004, 5 (06) :435-445