KABOOM! A new suffix array based algorithm for clustering expression data

被引:13
作者
Hazelhurst, Scott [1 ]
Liptak, Zsuzsanna [2 ]
机构
[1] Univ Witwatersrand, Sch Elect & Informat Engn, ZA-2050 Johannesburg, South Africa
[2] Univ Salerno, Dipartimento Informat, I-84084 Fisciano, Italy
基金
新加坡国家研究基金会;
关键词
SEQUENCE; TOOL;
D O I
10.1093/bioinformatics/btr560
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Results: We introduce a new filter for string similarity which has the potential to eliminate the need for all-versus-all comparison in clustering of expression data and other similar tasks. Our filter is based on multiple long exact matches between the two strings, with the additional constraint that these matches must be sufficiently far apart. We give details of its efficient implementation using modified suffix arrays. We demonstrate its efficiency by presenting our new expression clustering tool, wcd-express, which uses this heuristic. We compare it to other current tools and show that it is very competitive both with respect to quality and run time.
引用
收藏
页码:3348 / 3355
页数:8
相关论文
共 27 条
  • [1] Burkhardt S., 1999, INT C COMPUTATIONAL, P77
  • [2] HAZELHURST S, 2008, S AFRICAN COMPUT J, V24, P1542
  • [3] An overview of the wcd EST clustering tool
    Hazelhurst, Scott
    Hide, Winston
    Liptak, Zsuzsanna
    Nogueira, Ramon
    Starfield, Richard
    [J]. BIOINFORMATICS, 2008, 24 (13) : 1542 - 1546
  • [4] HEZELHURST S, 2003, TRWITSCS20031 U WITW
  • [5] mkESA: enhanced suffix array construction tool
    Homann, Robert
    Fleer, David
    Giegerich, Robert
    Rehmsmeier, Marc
    [J]. BIOINFORMATICS, 2009, 25 (08) : 1084 - 1085
  • [6] CAP3: A DNA sequence assembly program
    Huang, XQ
    Madan, A
    [J]. GENOME RESEARCH, 1999, 9 (09) : 868 - 877
  • [7] Data clustering: A review
    Jain, AK
    Murty, MN
    Flynn, PJ
    [J]. ACM COMPUTING SURVEYS, 1999, 31 (03) : 264 - 323
  • [8] KALYANARAMAN A, 2002, P IEEE C HIGH PERF C
  • [9] Levenshtein V.I., 1966, Soviet Physics Doklady
  • [10] Fast sequence clustering using a suffix array algorithm
    Malde, K
    Coward, E
    Jonassen, I
    [J]. BIOINFORMATICS, 2003, 19 (10) : 1221 - 1226