Kpax3: Bayesian bi-clustering of large sequence datasets

被引:8
作者
Pessia, Alberto [1 ]
Corander, Jukka [1 ,2 ,3 ]
机构
[1] Univ Helsinki, Dept Math & Stat, FIN-00014 Helsinki, Finland
[2] Univ Oslo, Dept Biostat, N-0317 Oslo, Norway
[3] Wellcome Trust Sanger Inst, Pathogen Genom, Hinxton CB10 1SA, England
基金
芬兰科学院;
关键词
VACCINE;
D O I
10.1093/bioinformatics/bty056
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Estimation of the hidden population structure is an important step in many genetic studies. Often the aim is also to identify which sequence locations are the most discriminative between groups of samples for a given data partition. Automated discovery of interesting patterns that are present in the data can help to generate new biological hypotheses. Results: We introduce Kpax3, a Bayesian method for bi-clustering multiple sequence alignments. Influence of individual sites will be determined in a supervised manner by using informative prior distributions for the model parameters. Our inference method uses an implementation of both split-merge and Gibbs sampler type MCMC algorithms to traverse the joint posterior of partitions of samples and variables. We use a large Rotavirus sequence dataset to demonstrate the ability of Kpax3 to generate biologically important hypotheses about differential selective pressures across a virus protein.
引用
收藏
页码:2132 / 2133
页数:2
相关论文
共 9 条
[1]   Rotavirus Overview [J].
Bernstein, David I. .
PEDIATRIC INFECTIOUS DISEASE JOURNAL, 2009, 28 (03) :S50-S53
[2]   Julia: A Fresh Approach to Numerical Computing [J].
Bezanson, Jeff ;
Edelman, Alan ;
Karpinski, Stefan ;
Shah, Viral B. .
SIAM REVIEW, 2017, 59 (01) :65-98
[3]   Virus Variation Resource-recent updates and future directions [J].
Brister, J. Rodney ;
Bao, Yiming ;
Zhdanov, Sergey A. ;
Ostapchuck, Yuri ;
Chetvernin, Vyacheslav ;
Kiryutin, Boris ;
Zaslavsky, Leonid ;
Kimelman, Michael ;
Tatusova, Tatiana A. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D660-D665
[4]   Dense genomic sampling identifies highways of pneumococcal recombination [J].
Chewapreecha, Claire ;
Harris, Simon R. ;
Croucher, Nicholas J. ;
Turner, Claudia ;
Marttinen, Pekka ;
Cheng, Lu ;
Pessia, Alberto ;
Aanensen, David M. ;
Mather, Alison E. ;
Page, Andrew J. ;
Salter, Susannah J. ;
Harris, David ;
Nosten, Francois ;
Goldblatt, David ;
Corander, Jukka ;
Parkhill, Julian ;
Turner, Paul ;
Bentley, Stephen D. .
NATURE GENETICS, 2014, 46 (03) :305-+
[5]  
Hoshino Y, 2000, J HEALTH POPUL NUTR, V18, P5
[6]  
Kilbourne ED, 2002, P NATL ACAD SCI USA, V99, P10748, DOI 10.1073/pnas.162366899
[7]   Bayesian search of functionally divergent protein subgroups and their function specific residues [J].
Marttinen, Pekka ;
Corander, Jukka ;
Toronen, Petri ;
Holm, Liisa .
BIOINFORMATICS, 2006, 22 (20) :2466-2474
[8]  
Mirkin B.G., 1996, MATH CLASSIFICATION, V11
[9]   K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets [J].
Pessia, Alberto ;
Grad, Yonatan ;
Cobey, Sarah ;
Puranen, Juha Santeri ;
Corander, Jukka .
MICROBIAL GENOMICS, 2015, 1 (01)