I came across an ASAP paper today describing substructure searching in Oracle databases. The paper comes from the folks at J & J and is part of their series of papers on the ABCD platform. Performing substructure searches in databases is certainly not a new topic and various products are out there that support this […]
Caching SMARTS Queries
Andrew Dalke recently published a detailed write up on his implementation of the Pubchem fingerprints and provided a pretty thorough comparison with the CDK implementation. He pointed out a number of bugs in the CDK version; but he also noted that performance could be improved by caching parsed SMARTS queries – which are used extensively […]
Visualizing PAINS SMARTS
A few days ago I had made available a SMARTS version of the PAINS substructural filters, that were converted using CACTVS from the original SLN patterns. I had mentioned that the SMARTSViewer application was a handy way to visualize the complex SMARTS patterns. Matthias Rarey let me know that his student had converted all the […]
Substructure Searching with Hadoop
My last two posts have described recent attempts at working with Hadoop, a map/reduce framework. As I noted, Hadoop for cheminformatics is quite trivial when working with SMILES files, which is line oriented but requires a bit more work when dealing with multi-line records such as in SD files. But now that we have a […]
Manipulating SMARTS Queries
Yesterday, Andreas Maunz asked a question on the openbabel-discuss list: … a possibility to combine two distinct smarts patterns. Of course, there is the comma OR operator (s1,s2), but I was thinking of a more sophisticated combination, where equal parts of s1 and s2 are “mapped onto each other” in a way such that merely […]