Support for feature,count fingerprints in fingerprint 3.5.0

I’ve just updated the fingerprint package to v3.5.0 (should show up on CRAN shortly, or else you can get it directly from my Github repository). The main update in this version is better support for feature,count type fingerprints. An example would be ECFP or signature fingerprints. In these types of fingerprints, the output is usually a set of (integer or long) hash values or else structural fragments along with their count of occurrences.

The updated package now provides an S4 class to represent features and their counts. An example of this class is

1
2
3
f1 <- new("feature",
          feature="[C]([N]([C]([N]([C][C,1](=[O]))=[O])[C](=[C]([C,1][N]([C,0]))[N](=[C,0]))))",
          count=as.integer(2))

The package provides getters and setters for these objects, allow you to get or set the feature and the count.

1
2
3
4
5
6
7
8
> feature(f1)
[1] "[C]([N]([C]([N]([C][C,1](=[O]))=[O])[C](=[C]([C,1][N]([C,0]))[N](=[C,0]))))"
> count(f1)
[1] 2
> feature(f1) <- 'ABCD'
> count(f1) <- 12
> f1
ABCD:12

Using this class, feature,count fingerprints are now represented as objects of class featvec. For these fingerprints, instead of bits, one obtains a list of feature objects. For fingerprints read from files that provide the hashed version of the underlying structure (or neighborhood etc), the numeric hashes are read in as features, with a default count of 1. The distance method has also been updated to evaluate similarities for feature,count fingerprints, though currently it does not use the count in the similarity calculation.

As an example, consider a set of ECFP’s available from here

1
2
3
4
5
6
7
8
9
10
> fps <- fp.read('http://pastebin.com/raw.php?i=gHjTQNKP', lf=ecfp.lf, binary=FALSE)
> fps[[1]]
Feature fingerprint
 name =  mol01
 source =  ecfp.lf
 features =  17:1 0:1 16:1 3:1 1:1 1747237384:1 1499521844:1 -1539132615:1 1294255210:1 332760439:1 -1549163031:1 1035613116:1 1618154665:1 590925877:1 1872154524:1 -1143715940:1 203677720:1 -1272768868:1 136120670:1 136597326:1 -1460348762:1 -1262922302:1 -1201618245:1 -402549409:1 -1270820019:1 929601590:1 -1597477966:1 -1274743746:1 -1155471474:1 1258428229:1 -1838187238:1 -798628285:1 -1773728142:1 -773983804:1 -453677277:1 1674451008:1 65948508:1 991735244:1 -1412946825:1 846704869:1 -2103621484:1 -886204842:1 1725648567:1 -353343892:1 -585443181:1 -533273616:1 2031084733:1 -801248129:1 1752802620:1 -976015189:1 -992213424:1 2109043264:1 -790336137:1 630139722:1 -505031736:1 -1427697183:1 -2090462286:1 -1724769936:1
> distance(fps[[1]], fps[[1]])
[1] 1
> distance(fps[[1]], fps[[2]])
[1] 0.1566265

2 thoughts on “Support for feature,count fingerprints in fingerprint 3.5.0

  1. Does this also mean support for the new CDK 1.5.x count fingerprints by Jonathan?

Leave a Reply to Egon Willighagen Cancel reply

Your email address will not be published. Required fields are marked *