So much to do, so little time

Trying to squeeze sense out of chemical data

rcdk On GitHub

I’ve been working on the rcdk and rcdklibs R packages that integrate the CDK into the R environment. For the past year or two it’s been hosted on r-forge.r-project.org which has a nice feature of nightly builds of R packages. Unfortunately their version control system is Subversion. Having used Git for my other projects, this was getting painful. So rcdk and rcdklibs are now hosted at Github as the cdkr project. This makes development much smoother, but I do loose the automated build feature. Sometime in the future, I’ll set up the repository to provide direct links to the packages (though they will also be available on CRAN)

Written by Rajarshi Guha

May 3rd, 2010 at 3:42 am

Posted in software,cheminformatics

Tagged with , ,

CDK Development with Git

As announced on the cdk-devel mailing list, the project has shifted to Git as its version control system. While the SVN repository is still available, it’s expected that all development from now on will be done using Git branches. To get things going Egon has put up a very nice page on the CDK wiki.

The use of Git makes development much more enjoyable – if nothing else, the fast branch capability allows one to easily test out multiple, disparate ideas and cleanly merge them back into main line code. Another side effect of this is that it also allows easy code review, which is extremely valuable in a distributed project such as the CDK. In this post I thought it’d be useful to describe my workflow using Git.

Set up

I won’t say much about this as the wiki page provides sufficient detail. In brief, we make a clone of the CDK master (located on Sourceforge)

 1 git clone git://cdk.git.sourceforge.net/gitroot/cdk

This gives us a directory called cdk. This is your local copy of the CDK master branch and periodically, you should do a git pull to make sure it’s in sync with what is on Sourceforge. Now, rather than work directly on this local copy, we make branches for each idea we want to try out.

 1 git checkout -b myIdea1

This way you can be working on multiple projects, each independent of each other. For example my setup looks like

 123456 [rguha@localhost cdk]$git branch -a master * pcore origin/HEAD origin/cdk-1.2.x origin/master The asterisk indicates I’m working on my pcore branch. master represents my local copy of the Sourceforge master. Making branches available Given a git repository there are a number of ways you can make it available. Approaches include emailing patches, creating bundles or setting up your own Git server. The last is worthwhile if you have a constant connection and multiple developers working on your code. Such a server can be based on Apache and WebDAV and is described here. Another alternative is to use Gitosis, which makes life quite easy, though I haven’t tried it myself. In my case, I use a shortcut (this is based on a Linux server). When I want to make stuff available to the world, I simply run git-daemon and point it to my local CDK repository  1 sudo -u git git-daemon --base-path /home/rguha/src/java/cdk.git --export-all Here the base-path indicates the directory that contains the repository directory. By doing export-all we indicate that all repositories under base-path be made available. In my case, there’s just the CDK repository. I created a user called git so that it’s permissions are a little restricted. However, this is likely not a very secure setup. I haven’t checked whether other users can write to my repository (which I’d rather not have them do!). Also, since I’m not going through xinetd I miss out on those benefits as well. Also see here on the use of git-daemon. Having said that, this setup is quite handy – since whenever I want others to look at a branch of mine (say for merging with the Sourceforge master) I can simply ask them to do  1 git pull git://rguha.ath.cx/cdk pcore and they will be able to pull all the latest commits from the pcore branch. If the work in this branch gets accepted I can simply shut down the daemon, until I decide to make some new work available to the world. Since this is my own machine and I’m the only developer working on it, I don’t need a persistent Git server, so this works great. My workflow So, now that I can easily make branches available to the world, how do I arrange my workflow? The approach I’ve taken is the following. My local master never has work done on it directly. The only operation I perform is pull. In other words I just make sure that it’s in sync with the Sourceforge master. (Of course for really minor edits I might just make them in my local master and make them available). All other ideas are tested out in a branch. Before working on a branch I’d switch to master, do a pull, switch to a branch I’d like to work on and then do a rebase (see here for a nice explanation) in this branch, so that it’s in sync with the Sourceforge master. Now, the CDK policy is that only the branch manager will be able to write to the Sourceforge master. Thus I cannot push changes to it and must request a review of my code – which I can do by starting the daemon as described above and pointing people to the branch. Once it’s accepted and merged into the Sourceforge master I can then pull to my local master. As an example here’s the commands that make up my workfllow  123456789101112131415161718192021222324252627282930313233343536373839404142434445464748$: git checkout master Switched to branch "master" $: git pull remote: Counting objects: 13, done. remote: Compressing objects: 100% (8/8), done. remote: Total 8 (delta 6), reused 0 (delta 0) Unpacking objects: 100% (8/8), done. From git://cdk.git.sourceforge.net/gitroot/cdk f927231..a4f56f2 cdk-1.2.x -> origin/cdk-1.2.x 5bab4a2..2a2bfe6 master -> origin/master Updating 5bab4a2..2a2bfe6 Fast forward README | 2 +- src/META-INF/extra.datafiles | 2 -- 2 files changed, 1 insertions(+), 3 deletions(-)$ git checkout pcore Switched to branch "pcore" $git rebase master First, rewinding head to replay your work on top of it... Applying: provided toString methods for the pcore query classes Applying: Added test methods for the new toString methods. Also added test method annotations /home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:76: trailing whitespace. Assert.assertEquals("AC::Amine [[CX2]N]::aromatic [c1ccccc1]::blah [C]::[54.74 - 54.74]", repr); /home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:110: trailing whitespace. Assert.assertEquals("DC::Amine [[CX2]N]::aromatic [c1ccccc1]::[1.0 - 2.0]", repr); warning: 2 lines add whitespace errors. Applying: Updated the toString tests /home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:15: trailing whitespace. Assert.assertEquals(0, repr.indexOf("AC::Amine [[CX2]N]::aromatic [c1ccccc1]::blah [C]::[54.74 - 54.74]")); /home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:28: trailing whitespace. String repr = qbond1.toString(); /home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:29: trailing whitespace. Assert.assertEquals(0, repr.indexOf("DC::Amine [[CX2]N]::aromatic [c1ccccc1]::[1.0 - 2.0]")); warning: 3 lines add whitespace errors. Applying: Refactored to provide a query container specifically for pharmacophore queries. # At this point, my pcore branch is synced up with the SF master # do some work on this branch, make it available, hear that it's merged into SF master$: git checkout master Switched to branch "master" \$: git pull # At this point I should have my commits from the pcore branch

The key thing is that rather than merge my branches into my local master and then point people to that, I instead point people to my branches. After my branch has been merged with the Sourceforge master, I pull the changes into my local master, followed by rebasing in my branches. While this does seem a little convoluted I can work on multiple projects, and not conflate them in my local master.

Written by Rajarshi Guha

March 12th, 2009 at 9:30 pm

Posted in software

Tagged with , ,