Faster sequence homology searches by clustering subsequences

被引:29
作者
Suzuki, Shuji [1 ,2 ]
Kakuta, Masanori [1 ]
Ishida, Takashi [1 ]
Akiyama, Yutaka [1 ,2 ]
机构
[1] Tokyo Inst Technol, Grad Sch Informat Sci & Engn, Tokyo 1528550, Japan
[2] Tokyo Inst Technol, Educ Acad Computat Life Sci ACLS, Tokyo 1528550, Japan
关键词
READ ALIGNMENT; CD-HIT; PROTEIN; GENERATION; DATABASE; BLAST; SETS; TOOL;
D O I
10.1093/bioinformatics/btu780
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. Results: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an similar to 2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is similar to 2.2-2.8 times faster than RAPSearch and is similar to 185-261 times faster than BLASTX.
引用
收藏
页码:1183 / 1190
页数:8
相关论文
共 26 条
[1]   Protein database searches using compositionally adjusted substitution matrices [J].
Altschul, SF ;
Wootton, JC ;
Gertz, EM ;
Agarwala, R ;
Morgulis, A ;
Schäffer, AA ;
Yu, YK .
FEBS JOURNAL, 2005, 272 (20) :5101-5109
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   Compressive genomics for protein databases [J].
Daniels, Noah M. ;
Gallant, Andrew ;
Peng, Jian ;
Cowen, Lenore J. ;
Baym, Michael ;
Berger, Bonnie .
BIOINFORMATICS, 2013, 29 (13) :283-290
[5]   The Pfam protein families database [J].
Finn, Robert D. ;
Mistry, Jaina ;
Tate, John ;
Coggill, Penny ;
Heger, Andreas ;
Pollington, Joanne E. ;
Gavin, O. Luke ;
Gunasekaran, Prasad ;
Ceric, Goran ;
Forslund, Kristoffer ;
Holm, Liisa ;
Sonnhammer, Erik L. L. ;
Eddy, Sean R. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D211-D222
[6]   CD-HIT: accelerated for clustering the next-generation sequencing data [J].
Fu, Limin ;
Niu, Beifang ;
Zhu, Zhengwei ;
Wu, Sitao ;
Li, Weizhong .
BIOINFORMATICS, 2012, 28 (23) :3150-3152
[7]  
Gilbert JA, 2010, STAND GENOMIC SCI, V3, P249, DOI [10.4056/aigs.1443528, 10.4056/sigs.1433550]
[8]   Structure, function and diversity of the healthy human microbiome [J].
Huttenhower, Curtis ;
Gevers, Dirk ;
Knight, Rob ;
Abubucker, Sahar ;
Badger, Jonathan H. ;
Chinwalla, Asif T. ;
Creasy, Heather H. ;
Earl, Ashlee M. ;
FitzGerald, Michael G. ;
Fulton, Robert S. ;
Giglio, Michelle G. ;
Hallsworth-Pepin, Kymberlie ;
Lobos, Elizabeth A. ;
Madupu, Ramana ;
Magrini, Vincent ;
Martin, John C. ;
Mitreva, Makedonka ;
Muzny, Donna M. ;
Sodergren, Erica J. ;
Versalovic, James ;
Wollam, Aye M. ;
Worley, Kim C. ;
Wortman, Jennifer R. ;
Young, Sarah K. ;
Zeng, Qiandong ;
Aagaard, Kjersti M. ;
Abolude, Olukemi O. ;
Allen-Vercoe, Emma ;
Alm, Eric J. ;
Alvarado, Lucia ;
Andersen, Gary L. ;
Anderson, Scott ;
Appelbaum, Elizabeth ;
Arachchi, Harindra M. ;
Armitage, Gary ;
Arze, Cesar A. ;
Ayvaz, Tulin ;
Baker, Carl C. ;
Begg, Lisa ;
Belachew, Tsegahiwot ;
Bhonagiri, Veena ;
Bihan, Monika ;
Blaser, Martin J. ;
Bloom, Toby ;
Bonazzi, Vivien ;
Brooks, J. Paul ;
Buck, Gregory A. ;
Buhay, Christian J. ;
Busam, Dana A. ;
Campbell, Joseph L. .
NATURE, 2012, 486 (7402) :207-214
[9]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202. Article published online before March 2002, 10.1101/gr.229202]
[10]   Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes [J].
Kurokawa, Ken ;
Itoh, Takehiko ;
Kuwahara, Tomomi ;
Oshima, Kenshiro ;
Toh, Hidehiro ;
Toyoda, Atsushi ;
Takami, Hideto ;
Morita, Hidetoshi ;
Sharma, Vineet K. ;
Srivastava, Tulika P. ;
Taylor, Todd D. ;
Noguchi, Hideki ;
Mori, Hiroshi ;
Ogura, Yoshitoshi ;
Ehrlich, Dusko S. ;
Itoh, Kikuji ;
Takagi, Toshihisa ;
Sakaki, Yoshiyuki ;
Hayashi, Tetsuya ;
Hattori, Masahira .
DNA RESEARCH, 2007, 14 (04) :169-181