Based on feedback from the recent R workshop at the EBI, I’ve updated the rcdk package to include more methods operating on atoms, a modification to parse.smiles to allow it to handle a vector of SMILES strings, which makes it more R-like (thanks to Tobias Verbeke for the patch). In addition, one can now load very large SMILES or SDF files using the iterating readers from the CDK. This feature makes use of the iterators package and lets you write code such as
1 2 3 4 5 | iter <- iload.molecules('big.smi', type='smi') while(hasNext(iter)) { mol <- nextElem(iter) print(get.property(mol, "cdk:Title") } |
As a result, only one molecule is loaded at a time, allowing one to process arbitrarily large files. Version 2.9.23 has been uploaded to CRAN and should be available in a day or two