So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘git’ tag

rcdk On GitHub

with 2 comments

I’ve been working on the rcdk and rcdklibs R packages that integrate the CDK into the R environment. For the past year or two it’s been hosted on which¬†has a nice feature of nightly builds of R packages. Unfortunately their version control system is Subversion. Having used Git for my other projects, this was getting painful. So rcdk and rcdklibs are now hosted at Github as the cdkr project. This makes development much smoother, but I do loose the automated build feature. Sometime in the future, I’ll set up the repository to provide direct links to the packages (though they will also be available on CRAN)

Written by Rajarshi Guha

May 3rd, 2010 at 3:42 am

Posted in software,cheminformatics

Tagged with , ,

CDK Development with Git

without comments

As announced on the cdk-devel mailing list, the project has shifted to Git as its version control system. While the SVN repository is still available, it’s expected that all development from now on will be done using Git branches. To get things going Egon has put up a very nice page on the CDK wiki.

The use of Git makes development much more enjoyable – if nothing else, the fast branch capability allows one to easily test out multiple, disparate ideas and cleanly merge them back into main line code. Another side effect of this is that it also allows easy code review, which is extremely valuable in a distributed project such as the CDK. In this post I thought it’d be useful to describe my workflow using Git.

Set up

I won’t say much about this as the wiki page provides sufficient detail. In brief, we make a clone of the CDK master (located on Sourceforge)

git clone git://

This gives us a directory called cdk. This is your local copy of the CDK master branch and periodically, you should do a git pull to make sure it’s in sync with what is on Sourceforge. Now, rather than work directly on this local copy, we make branches for each idea we want to try out.

git checkout -b myIdea1

This way you can be working on multiple projects, each independent of each other. For example my setup looks like

[rguha@localhost cdk]$ git branch -a
* pcore

The asterisk indicates I’m working on my pcore branch. master represents my local copy of the Sourceforge master.

Making branches available

Given a git repository there are a number of ways you can make it available. Approaches include emailing patches, creating bundles or setting up your own Git server. The last is worthwhile if you have a constant connection and multiple developers working on your code. Such a server can be based on Apache and WebDAV and is described here. Another alternative is to use Gitosis, which makes life quite easy, though I haven’t tried it myself.

In my case, I use a shortcut (this is based on a Linux server). When I want to make stuff available to the world, I simply run git-daemon and point it to my local CDK repository

sudo -u git git-daemon --base-path /home/rguha/src/java/cdk.git --export-all

Here the base-path indicates the directory that contains the repository directory. By doing export-all we indicate that all repositories under base-path be made available. In my case, there’s just the CDK repository. I created a user called git so that it’s permissions are a little restricted. However, this is likely not a very secure setup. I haven’t checked whether other users can write to my repository (which I’d rather not have them do!). Also, since I’m not going through xinetd I miss out on those benefits as well. Also see here on the use of git-daemon.

Having said that, this setup is quite handy – since whenever I want others to look at a branch of mine (say for merging with the Sourceforge master) I can simply ask them to do

git pull git:// pcore

and they will be able to pull all the latest commits from the pcore branch. If the work in this branch gets accepted I can simply shut down the daemon, until I decide to make some new work available to the world. Since this is my own machine and I’m the only developer working on it, I don’t need a persistent Git server, so this works great.

My workflow

So, now that I can easily make branches available to the world, how do I arrange my workflow? The approach I’ve taken is the following. My local master never has work done on it directly. The only operation I perform is pull. In other words I just make sure that it’s in sync with the Sourceforge master. (Of course for really minor edits I might just make them in my local master and make them available). All other ideas are tested out in a branch. Before working on a branch I’d switch to master, do a pull, switch to a branch I’d like to work on and then do a rebase (see here for a nice explanation) in this branch, so that it’s in sync with the Sourceforge master.

Now, the CDK policy is that only the branch manager will be able to write to the Sourceforge master. Thus I cannot push changes to it and must request a review of my code – which I can do by starting the daemon as described above and pointing people to the branch. Once it’s accepted and merged into the Sourceforge master I can then pull to my local master. As an example here’s the commands that make up my workfllow

$: git checkout master
Switched to branch "master"

$: git pull
remote: Counting objects: 13, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 8 (delta 6), reused 0 (delta 0)
Unpacking objects: 100% (8/8), done.
From git://
   f927231..a4f56f2  cdk-1.2.x  -> origin/cdk-1.2.x
   5bab4a2..2a2bfe6  master     -> origin/master
Updating 5bab4a2..2a2bfe6
Fast forward
 README                       |    2 +-
 src/META-INF/extra.datafiles |    2 --
 2 files changed, 1 insertions(+), 3 deletions(-)

$ git checkout pcore
Switched to branch "pcore"

$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: provided toString methods for the pcore query classes
Applying: Added test methods for the new toString methods. Also added test method annotations
/home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:76: trailing whitespace.
        Assert.assertEquals("AC::Amine [[CX2]N]::aromatic [c1ccccc1]::blah [C]::[54.74 - 54.74]", repr);        
/home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:110: trailing whitespace.
        Assert.assertEquals("DC::Amine [[CX2]N]::aromatic [c1ccccc1]::[1.0 - 2.0]", repr);        
warning: 2 lines add whitespace errors.
Applying: Updated the toString tests
/home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:15: trailing whitespace.
        Assert.assertEquals(0, repr.indexOf("AC::Amine [[CX2]N]::aromatic [c1ccccc1]::blah [C]::[54.74 - 54.74]"));        
/home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:28: trailing whitespace.
        String repr = qbond1.toString();        
/home/rguha/src/java/cdk.git/cdk/.git/rebase-apply/patch:29: trailing whitespace.
        Assert.assertEquals(0, repr.indexOf("DC::Amine [[CX2]N]::aromatic [c1ccccc1]::[1.0 - 2.0]"));        
warning: 3 lines add whitespace errors.
Applying: Refactored to provide a query container specifically for pharmacophore queries.

# At this point, my pcore branch is synced up with the SF master
# do some work on this branch, make it available, hear that it's merged into SF master

$: git checkout master
Switched to branch "master"

$: git pull

# At this point I should have my commits from the pcore branch

The key thing is that rather than merge my branches into my local master and then point people to that, I instead point people to my branches. After my branch has been merged with the Sourceforge master, I pull the changes into my local master, followed by rebasing in my branches. While this does seem a little convoluted I can work on multiple projects, and not conflate them in my local master.

Written by Rajarshi Guha

March 12th, 2009 at 9:30 pm

Posted in software

Tagged with , ,

First Steps with Git

with 3 comments

With all the stuff I’ve been hearing about Git I’ve been looking to play around with it. While I have been hosting my own Subversion repo on my office machine, the use of GitHub seemed like a good way to play with Git and also have a stable external repo.

So right now the CDKDescUI project has been shifted into Git and is located here. I’ve also shifted my REST web services here

Written by Rajarshi Guha

January 5th, 2009 at 5:50 pm

Posted in software

Tagged with ,