Chemistry, Clouds, Collaboration (Part 2) | So much to do, so little time

In my previous post I talked mainly about why there isn’t a large showing of chemistry in the cloud. It was based of Deepaks post and a FriendFeed thread, but really only addressed the first two words of the title. The issue of collaboration came up in the FriendFeed thread via some comments from Matthew Todd. He asked

I am also interested in why there are so few distributed chemistry collaborations – i.e. those involving the actual synthesis of chemical compounds and their evaluation. Does it come down to data sharing tools?

The term “distributed chemistry collaborations” arises, partly, from a recent paper. But one might say that the idea of distributed collaborations is already here. Chemists have been collaborating in variety of ways, though many of these collaborations are small and focused (say between two or three people).

I get the feeling that Matthew is talking about larger collaborations, something on the lines of the CombiUgi project or the ONS Challenge. I think there are a number of factors that might explain why we don’t see more such large, distributed chemistry collaborations.

First, there is the issue of IP and credit. How will it get apportioned? If each collaborator is providing a specific set of skills, I can see it being relatively simple. But then it also sounds like pretty much any current collaboration. What happens when multiple people are synthesizing different compounds? And you have multiple people doing assays? How is work dividied? How is credit received? And are large, loosely managed groups even efficient? Of course, one could compare the scenario to many large Open Source projects and their management issues.

Second, I think data sharing tools are a factor. How do collaborations (especially those without an informatics component) efficiently share information? Probably Excel – but there are a number of efforts such as CDD and ChemSpider which are making it much easier for chemists to share chemical information.

A third factor that is somewhat related to the previous point is that academic chemistry has somewhat ignored the informatics aspects of chemistry (both as infrastructure topic as well as a research area). I think this is partly related to the scale of academic chemistry. Certainly, many topics in chemical research do not require informatics capabilities (compared to say ab initio computational capabilities). But there are a number of areas, such as the type that Matthew notes, that can greatly benefit from an efficient informatics infrastructure. I certainly won’t say that it’s all there and ready to use – but I think it’s important cheminformatics plays a role. In this sense, one could say that there would be many more distributed collaborations, if the chemists knew that there was an infrastructure that could help their efforts. I will also note that it’s not just about infrastructure – while important, it’s also pretty straightforward IT (given some domain knowledge). I do think that there is a lot more to cheminformatics than just setting up databases, that can support bench chemistry efforts. Industry realizes this. Academia hasn’t so much (at least yet).

Which leads me to the fourth factor, which is social. Maybe the reason for the lack of such collaborations is there chemists just don’t have a good way of getting the word out they are available and/or interested. Certainly, things like FriendFeed are a venue for things like this to happen, but given that most academic chemists are conservative, it may take time for this to pick up speed.

4 thoughts on “Chemistry, Clouds, Collaboration (Part 2)”

Mat Todd says:

February 23, 2009 at 2:02 am

Excellent summary Rajarshi. I think you’re right that chemists generally are not aware of the immense advances taking place in cheminformatics, towards tools that will have real, collaborative value.

Perhaps Chemistry is unusual in that it has for a long time been supported by huge, wealthy organisations (ACS, RSC) that provide privately-funded, monolithic data sources. We have not had to worry too much! These organisations have a strong hold on subscriber-pays publishing also – there are few open access chemistry journals (Chemistry Central, Beilstein J, PLoS One (technically)). Perhaps this has stifled our ability to collaborate as large groups efficiently? Yes – collaboration is the nuts and bolts of chemistry, but not as diffuse groups in the cloud. Jean-Claude Bradley has been doing exceptional work on his distributed projects, and we’re trying to organise and host chemical projects on The Synaptic Leap.

We need the tools. I sense a grant proposal forming…

More on chemistry and the data web : business|bytes|genes|molecules says:

February 23, 2009 at 7:32 am

[…] to a post on Friendfeed. That’s becoming a fairly active discussion and led to a couple of posts by […]

Rajarshi Guha says:

February 23, 2009 at 11:29 pm

Thanks Matthew. While the closed data sources represented by ACS and others do represent a problem, my impression has been that by not being aware of the informatics aspects, many chemists have not considered the broader possibilities.

Also, to be fair, cheminformatics and modeling etc do have their share of problems. They are certainly not a panacea. Unfortunately, a cheminformatician only needs to fail once after which they loose credibility

It’s a two way street – cheminformatics can’t just throw tools and UI’s at the chemists without considering what they really need and find useful. At the same time, chemists should understand that cheminformatics and modeling won’t give them the magic compound out of thin air!

beilstein database says:

March 28, 2010 at 3:43 am

[…] for ACS or Beilstein services? So I wouldn’t predict explosive growth like Flickr or Google …Chemistry, Clouds, Collaboration (Part 2) at So much to do …Trying to squeeze sense out of chemical data … hold on subscriber-pays publishing also – there are […]

4 thoughts on “Chemistry, Clouds, Collaboration (Part 2)”

Leave a Reply Cancel reply