LeadMine

Version 4.1.3 [2025-06]

Chemical Text Mining

The ever increasing rate of publication of scientific literature and patents makes it difficult for researchers in the pharmaceutical industry to stay current with the latest developments and trends. NextMove Software's LeadMine product is a text mining tool for the identification and annotation of chemicals, protein targets, genes, diseases, species, named reactions, company names, cell lines, etc. in the text of documents. Whilst initially developed to identify molecules of interest to medicinal chemists in patent applications, its functionality has been extended to also handle arbitrary entity types specified by dictionaries, ontologies, regular expressions or formal grammars.

A significant competitive advantage of LeadMine over similar tools is its use of NextMove Software's CaffeineFix automatic spelling correction technology, that allows it to identify (and correct) misspelt terms and entities, including those introduced through optical character recognition (OCR), hyphenation and line-breaking or human error. This ability to handle noisy real-world text has been shown to significantly improve recall rates over non-correcting approaches and methods using simplistic rule-based OCR correction heuristics.

A significant feature of LeadMine and CaffeineFix is their ability to efficiently handle very large dictionaries, often containing tens of millions of terms/synonyms. Such large synonym dictionaries are not uncommon in chemical and biological text mining, and are often problematic for many text mining tools not designed for processing scientific and technical documents.

Another unique feature of LeadMine is its ability to also perform chemical named entity recognition of Chinese (both simplified and traditional) and Japanese documents.

Arthor provides fast state-of-the-art substructure and chemical similarity search capabilities for ultra-large databases of hundreds of millions of compounds, using SMARTS optimization, Just-In-Time compilation and/or GPUs.

CaffeineFix is used to rapidly match chemical names or terms against a dictionary or grammar (e.g. a grammar for IUPAC names). As well as use in text-mining, it can be used to provide autocomplete functionality and spell-correction.

Casandra is a server for delivering real time safety warnings of experimental hazards straight to the pharmaceutical electronic laboratory notebooks (ELNs).

HazELNut is a suite of tools used to extract, normalize and analyse information in Electronic Lab Notebooks (ELNs). This can be used to implement a search interface, find/eliminate duplicates, find similar reactions and so on.

LeadMine extracts chemical names and terms from text. It incorporates NextMove's CaffeineFix technology to find terms that match appropriate dictionaries or grammars. It has enhanced functionality to handle the patent literature.

Matsy is a set of tools for creating and analysing Matched Molecular Series (the general form of Matched Molecular Pairs). In particular, it can be used to suggest what compound to make next in a Medicinal Chemistry program.

MPSearch rapidly searches a database to find Matched Pairs related to a query molecule. This type of search is used to explore previous medicinal chemistry strategies.

NameRXN is used to classify and name reactions. It is particular useful in the context of ELN analysis but also as a plugin to chemical drawing software. NameRXN builds on NextMove Software's Patsy technology.

Patsy is used to speed up SMARTS pattern matching by creating optimized SMARTS patterns or source code. Speed gains are particularly large when multiple SMARTS patterns are matched against a single structure.

Pistachio is a reaction dataset browser providing loading, querying, and analytics of chemical reactions. With over 21 million chemical reactions extracted from US & EPO patents, it demonstrates an AI interface to faceted (structure) search

SmallWorld is an index of chemical space based on more than 230 billion molecular substructures. It can be used to measure similarity based on graph-edit distance, find the MCS of two or more molecules, analyse HTS results and much more.

Sugar & Splice can be used to perceive and depict biopolymer structure. It makes it easy to interconvert between small-molecule representations (e.g. SMILES, MOL) and biopolymer representations (HELM, IUPAC line notation).

General Inquiries: info@nextmovesoftware.com Support: support@nextmovesoftware.com

LeadMine

Chemical Text Mining

General Inquiries: info@nextmovesoftware.com
Support: support@nextmovesoftware.com