So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘community’ tag

MIOSS Workshop Wrap Up

without comments

The last few days I’ve been at the EBI, attending the Molecular Informatics Open Source Software (MIOSS) workshop. As part of this trip to the UK, I’ve also had the opportunity to present some of the work my colleagues and I have done at the NCTT – thanks to Mark Forster for the invitation to speak at Syngenta and to John Chambers for having me speak to the ChEMBL group. At the workshop I presented my work on cheminformatics in R.

The focus of the workshop was to bring OSS developers and users from industry and academica/government together to hear about a variety of projects and discuss issues underlying the development and use of these projects. There were some very nice presentations – I won’t go into too much detail but some highlights for me included

  • Kevin Lawson (Syngenta) presented his work on LICSS – integrating the CDK with Excel. While I’ not a fan of Excel, it’s a necessary evil. I was quite surprised at the performance he acheived for substructure searches within Excel and the ability to access various functionalities of the CDK as Excel functions. While it probably won’t replace Accord or ChemOffice right now, it’s something to take a look at.
  • Mike Bodkin (Lilly) spoke about the use of KNIME at Lilly. They have built up an extensive collection of commercial and OSS nodes and it’s clear that KNIME is capable of giving Pipeline Pilot a run for its money. Thorsten Mienl then spoke of the OSS development of KNIME, and mentioned that they now support a collection of HCS and image analysis nodes (courtesy MPI Dresden). This is quite interesting, given that we’re ramping up our HCS capabilities at the NCTT
  • Hans de Winter of Silicos spoke about the tools and services that their company has produced on top of OpenBabel (and contributed back to the community). Quite encouraging to see a cheminformatics company making money of the OSS stack
  • Greg Landrum spoke about RDKit, presenting the RDKit based catridge for Postregsql. He showed some nice performance numbers and it was nice to see that they had gotten the coders who implemented the GiST indexing mechanisms to implement a GiST index for binary fingerprints.

In addition to these, there were other talks on Openbabel, Cinfony, Taverna, fpocket and others. While I’ve known about many of these projects it was useful to learn some of the details from the developers themselves.

A number of issues surrounding OSS development and use were discussed. For example, community development was regarded as a key factor in the success of OSS projects. Erik Lindahl of GROMACS fame, spoke about the development model of GROMACS and how important their success has been due to community involvement. Some other issues included the importance (and lack of) good documentation, what makes people contribute to OSS and so on.

The fact that industry participation was about 50% of group was nice. And a number of industry-related issues also arose. For example, there were several discussion of business models based around OSS and how they can feed back into OSS projects. A commen thread seemed to be that service and customization of OSS are good approaches to building businesses around the OSS stack, Silicos and Eagle Genomics being two prime examples.

The fact that there are industry users of OSS as well as industry members contributing back to OSS projects was very encouraging. An idea supported by a number of participants was some form of web site / wiki where such contributors and users could list themselves. (IMO, the Blue Obelisk wiki, could be a candidate for this type of thing).  Sure, there’d be usually corporate and legal barriers to this type of thing, but if done would have a number of benefits – encouragement for project developers and easily viewable precedent that would encourage other companies to use or participate in OSS projects, resulting in a positive feedback loop. With various pre-competitive collaboration efforts (e.g., Pistoia Alliance) popping up in the pharma industry, this is certainly possible.

Finally, it’s always good to meet up with old friends and also meet people whom I’ve only known over email. The social aspects of the workshop were very nice – helped greatly by excellent food and drink! Thanks to Mark for putting together a great meeting.

Written by Rajarshi Guha

May 6th, 2011 at 6:25 pm

Posted in cheminformatics

Tagged with , , ,

Stack Overflow – Not for Chemistry?

without comments

Rich Apodaca recently wrote a post highlighting StackOverflow – a community discussion site for software development, suggesting that a similar type of site for chemists would not work. He also posted a follow up listing some factors that make something like StackOverflow unlikely for the chemistry community. I had made a quick comment noting that one difference between the culture of the chemistry and software communities was possibilities of commercialization. On thinking about it a little, this is not entirely correct, as both communities generate ideas and work that lead to commercialization.

But I think that the difference lies in the nature of the commercialization process. As Rich pointed out in his followup post, entrepreneurship and resources are two important sources of differences between the chemistry and software communities. In the latter community, two people can implement an idea with minimal resource investment and end up with a profitable product. In contrast, two chemists might come up with an idea, but in many cases, it will require significant investment in resources to get an initial product (and scale up would be a separate issue).

In that sense, the process of commercialization in chemistry can be a longer process – and if that’s the case, it’s not surprising that we see the differences. In fact, if we’re comparing chemistry to some computer related field, it seems that a comparison with the computer hardware is more appropriate than computer software, especially when we consider the costs involved in the commercialization process. (Though with FPGA’s and chip fabs, computer hardware startups are probably easier than a chemistry startup).

Another factor that differentiates chemistry from computer software or hardware, is that chemistry projects are not usually spare time projects. One can write software or design (basic) hardware as a spare time thing which, if they turn out to feasible/useful/interesting can be transformed to an actual product. Again, this goes back to the costs involved in testing out and implementing new ideas without institutional backing.

Rich’s other points are also good and I think his comments on patents vs copyrights is especially important. However, I’m not so sure about the issue of history – obviously, history brings tradition (baggage?), but is this really a big factor? It seems that the implications of history overlap to a large degree with “established communication channels”

Written by Rajarshi Guha

May 2nd, 2009 at 12:26 am

Posted in software

Tagged with , ,