Sachem: a chemical cartridge for high-performance substructure search

被引:18
作者
Kratochvil, Miroslav [1 ,2 ]
Vondrasek, Jiri [1 ]
Galgonek, Jakub [1 ]
机构
[1] CAS, Inst Organ Chem & Biochem, Flemingovo Namesti 2, Prague 16610 6, Czech Republic
[2] Charles Univ Prague, Fac Math & Phys, Dept Software Engn, Malostranske Namesti 25, Prague 11800 1, Czech Republic
来源
JOURNAL OF CHEMINFORMATICS | 2018年 / 10卷
关键词
Substructure search; Small molecule databases; Molecule cartridges; Inverted indices; SIMILARITY SEARCH; DESCRIPTORS; DATABASE;
D O I
10.1186/s13321-018-0282-y
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. Results: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. Conclusions: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
引用
收藏
页数:11
相关论文
共 36 条
  • [1] Efficient Substructure Searching of Large Chemical Libraries: The ABCD Chemical Cartridge
    Agrafiotis, Dimitris K.
    Lobanov, Victor S.
    Shemanarev, Maxim
    Rassokhin, Dmitrii N.
    Izrailev, Sergei
    Jaeger, Edward P.
    Alex, Simson
    Farnum, Michael
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (12) : 3113 - 3130
  • [2] [Anonymous], 2003, Internet Mathematics, DOI DOI 10.1080/15427951.2004.10129096
  • [3] [Anonymous], SUBSTRUCTURE SEARCH
  • [4] [Anonymous], J CHEMINFORM
  • [5] [Anonymous], 2015, Apache Solr enterprise search server
  • [6] [Anonymous], SUBSTRUCTURAL QUERY
  • [7] SUBSTRUCTURE SEARCHING METHODS - OLD AND NEW
    BARNARD, JM
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1993, 33 (04): : 532 - 538
  • [8] Bialecki A., 2012, SIGIR 2012 WORKSH OP, P17
  • [9] Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection
    Brown, RD
    Martin, YC
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03): : 572 - 584
  • [10] Molecular fingerprint similarity search in virtual screening
    Cereto-Massague, Adria
    Jose Ojeda, Maria
    Valls, Cristina
    Mulero, Miguel
    Garcia-Vallve, Santiago
    Pujadas, Gerard
    [J]. METHODS, 2015, 71 : 58 - 63