Secret sequence comparison in distributed computing environments by Interval Sampling

被引:0
作者
Kurata, K [1 ]
Breton, V [1 ]
Nakamura, H [1 ]
机构
[1] Univ Tokyo, Res Ctr Adv Sci & Technol, Meguro Ku, Tokyo 1538904, Japan
来源
PROCEEDINGS OF THE 2004 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY | 2004年
关键词
sequence comparison; sampling; secret; grid;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Once a new gene has been sequenced, it must be verified whether or not it is similar to previously sequenced genes. In many cases, the organization that sequenced a potentially novel gene needs to keep the sequence itself in confidence. However, to compare the potentially novel sequence with known sequences, it must either be sent as a query to public databases, or these databases must be downloaded onto a local computer. In both cases, the potentially new sequence is exposed to the public. In this work, we propose a new method, called Interval Sampling, to compare sequences without leaking exact information about the new sequence. In order to keep the exact sequence information secret, this method samples intervals (subsequences) from a sequence, and these intervals are hashed. The hashed data are open to the public to verify the novelty of the sequence. We find that this method works well in parallel in a distributed computing environment, such as the Grid. The experimental results for 19797 h.sapiens genes and 25000 m.musculus genes show that the parallel implementation of this method performs reasonably well in terms of speed and memory usage.
引用
收藏
页码:198 / 205
页数:8
相关论文
共 11 条
  • [1] BARANY F, 1991, PCR METH APPL, V1, P149
  • [2] BRETON V, 2001, P C SYN BIOINF MED I
  • [3] Alignment of whole genomes
    Delcher, AL
    Kasif, S
    Fleischmann, RD
    Peterson, J
    White, O
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (11) : 2369 - 2376
  • [4] The anatomy of the grid: Enabling scalable virtual organizations
    Foster, I
    Kesselman, C
    Tuecke, S
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (03) : 200 - 222
  • [5] Foster I., 2002, PHYS TODAY, V54
  • [6] A method to verify originality of sequences secretly on distributed computing environment
    Kurata, K
    Nakamura, H
    Breton, V
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND GRID IN ASIA PACIFIC REGION, PROCEEDINGS, 2004, : 310 - 319
  • [7] KURATA K, 2003, T ADV COMPUTING SYST, V44, P34
  • [8] KURATA K, 2003, P 3 IEEE ACM INT S C, P62
  • [9] KURATA K, 2003, P 1 AS PAC BIOINF C, P43
  • [10] Selection of optimal DNA oligos for gene expression arrays
    Li, FG
    Stormo, GD
    [J]. BIOINFORMATICS, 2001, 17 (11) : 1067 - 1076