Arthor
Version 4.2.4 [Nov 2024]
High-Performance Chemical Database Searching
NextMove Software's Arthor technology (named after Merlin's apprentice) pushes the performance limits of chemical database search on current computer hardware. Building upon NextMove Software's Patsy chemical pattern matching engine, Arthor easily outperforms current chemical cartridges, scaling to handle the hundreds of millions of compounds found in next generation chemical databases.
Substructure Searching
Traditional chemical database search engines rely on successful fingerprint screening to achieve their high performance substructure search. This requirement means that relatively broad queries that have poor fingerprint screening have significantly worse performance, adversely affecting average and worst case search times. By tackling the computationally intensive SMARTS matching phase of a search, Arthor dramatically improves worst-case (and therefore average) search times, achieving the real-time performance bounds required by interactive users.
Similarity Searching
Similarity searches using fingerprint-based Tanimoto scores typically rely on a popcount sorted index to bound and improve search times. Unfortunately the popular search bounds described by Swamidass and Baldi (2007) are only effective for denser path-based fingerprints. Sparser circular fingerprints (e.g. ECFP) see little or no benefit from bounding using these bounds and other techniques are required to improve search speed. Arthor uses on-the-fly code generation to create query specific machine instructions and a linear-time sort algorithm is used to rank and page results. Databases containing hundreds of millions hits can be interactively queried in real time.
More info:
- John Mayfield (né May) and Roger Sayle. Substructure Search Face-off, CCNM, May 2015. PDF
- Roger Sayle. Recent Advances in Chemical and Biological Search Systems: Evolution vs. Revolution, ICCS 2018, May 2018. PDF
- John Mayfield. PAINS in the butt. CCNM. Feb 2019 PDF
- John Mayfield and Roger Sayle. The Secrets of Fast SMARTS Matching. 8th Joint Sheffield Conference on Chemoinformatics. June 2019 PDF
