So much to do, so little time

Trying to squeeze sense out of chemical data

Archive for the ‘rstats’ tag

A Report from a Stranger in a Strange Land

without comments

I just got back from ACoP7, the yearly meeting of the International Society of Pharmacometrics (ISoP). Now, I don’t do any PK/PD modeling (hence the “strange land”) but was invited to talk about our high throughput screening platform for drug combinations. I also hoped to learn a little more about this field as well as get an idea of the state of quantitative systems pharmacology (QSP). This post is a short summary of some aspects of the meeting and the PK/PD field that caught my eye, especially as an outsider to the field (hence the “stranger”).

The practice of PK/PD is clearly quite a bit downstream in the drug development pipeline from where I work, though it can be beneficial to keep PK/PD aspects in mind even at the lead discovery/optimization stages. However I did come across a number of talks and posters that were attempting to bridge pre-clinical and clinical stages (and in some cases, even making use of in vitro) data. As a result the types of problems being considered were interesting and varied – ranging from models of feeding to predict weight loss/gain in neonates to analyzing drug exposure using mechanistic models.

A lot of PK/PD problems are addressed using model based methods, as opposed to machine learning methods (see Breiman, 2001). I have some familiarity with the types of statistics used, but in practice much of my work is better suited for machine learning approaches. However, I did come across nice examples of some methodologies that may be useful in QSAR type settings – including mixed effect models, IRT models and Bayesian methods. It was also nice to see a lot of people using R (ISoP even runs a Shiny server for members’ applications) and companies providing R solutions (e.g., Metrum, Mango) and came across a nice poster (Justin Penzenstadler, UMBC) comparing various R packages for NLME modeling. I also came across Stan, which seems like a good way to get into Bayesian modeling. Certainly worth exploring nore.

The data used in a lot of PK/PD problems is also qualitatively (and quantitatively) different from my world of HTS and virtual screening. Datasets tend to be smaller and noiser, which are challenging to model (hence less focus on purely data driven, distribution-free M/L methods). A number of presentations showed results with quite wide CI’s and significant variance in the observed properties. At the same time, models tend to be smaller in terms of features, which are usually driven by the disease state or the biology being modeled. This is in contrast to the 1000’s of descriptors we deal with in QSAR. However, even with smaller feature sets I got the impression that feature selection (aka covariate selection) is a challenge.

Finally, I was interested in learning more about QSP. Having followed this topic on and off (my initiation was this white paper), I wasn’t really up to date and was a bit confused between QSP and phsyiologically based PK (PBPK) models, and hoped this meeting would clarify things a bit. Some of the key points I was able to garner

  • QSP models could be used to model PK/PD but don’t have to. This seems to be the key distinction between QSP and PBPK approaches
  • Building a comprehensive model from scratch is daunting, and speaking to a number of presenters, it turns out many tend to reuse published models and tweak them for their specific system. (this also leads one to ask what is “useful”?)
  • Some models can be very complex – 100’s of ODE‘s and there were posters that went with such large models but also some that went with smaller simplified models. It seems that one can ask “How big a model should you go for to get accurate results?” as well as “How small a model can you get away with to get accurate results?“. Model reduction/compression seems to be an actively addressed topic
  • One of the biggest challenges for QSP models is the parametrization – which appears to be a mix of literature hunting, guesswork and some experiment. Examples where the researcher used genomic or proteomics data (e.g. Jaehee Shim, Mount Sinai) were more familiar to me, but nonetheless, daunting to someone who would like to use some of this work, but is not an expert in the field (or a grad student who doesn’t sleep). PK/PD models tend to require fewer parameters, though PBPK models are more closer to QSP approaches in terms of their parameter space.
  • Where does one find models and parameters in reusable (aka machine readable) formats? This is an open problem and approaches such as DDMoRE are addressing this with a repository and annotation specifications.
  • Much of QSP modeling is done in Matlab (and many published models are in the form of Matlab code, rather than a more general/abstract model specification). I didn’t really see alternative approaches (e.g., agent based models) to QSP models beyond the ODE approach.
  • ISoP has a QSP SIG which looks like an interesting place to hang out. They’ve put out some papers that clarify aspects of QSP (e.g., a QSP workflow) and lay out a roadmap for future activities.

So, QSP is very attractive since it has the promise of supporting mechanistic understanding of drug effects but also allowing one to capture emergent effects. However, it appears to be very problem & condition specific and it’s not clear to me how detailed I’d need to get to reach an informative model. It’s certainly not something I can pull off-the-shelf and include in my projects. But definitely worth tracking and exploring more.

Overall, it was a nice experience and quite interesting to see the current state of the art in PK/PD/QSP and learn about the challenges and successes that people are having in this area. (Also, ISoP really should make abstracts publicly linkable).

Written by Rajarshi Guha

October 26th, 2016 at 9:39 pm

rinchi – An R package to generate InChI’s and InChI Keys

with 4 comments

While trying to update rcdk on CRAN it was pointed out to me that usage of the library resulted in modifications to the users home directory. Specifically, this occurred when generating InChI‘s. The CDK makes use of jni-inchi, which in turn depends on JNATI¬†which enables Java code to work with native libraries in a platform independent fashion. As part of this, it creates $HOME/.jnati – which is a no-no for CRAN packages. To resolve this, the latest version of rcdklibs excludes the InChI module and its dependencies. Hopefully rcdk and rcdklibs will now pass CRAN QC.

To access InChI functionality in R you can use the rinchi package which is hosted on Github. Since it will modify the users home directory, it cannot be hosted on CRAN. However, it’s easy enough to install

install_github("cdkr", "rajarshi", subdir="rinchi")

Importantly, if all you need is to go from SMILES to InChI, there is no need to install rcdk as well. So the following works

inchi <- get.inchi('CCC')
inchik <- get.inchi.key('CCC')

But if you do have a molecule object obtained via rcdk, you can also pass that in to get an InChI or InChI key representation.

Written by Rajarshi Guha

August 30th, 2014 at 6:23 pm

Posted in software,cheminformatics

Tagged with , , , , ,

Chunking lists in R

without comments

A common task for is to run database queries on gene symbols or compound identifiers. This involves constructing an SQL query as a string and sending that off to the database. In the case of the ROracle package, the query strings are limited to a 1000 (?) or so characters. This means that directly querying for a thousand identifiers won’t work. And going through the list of identifiers one at a time is inefficient. What we need in this situation is a to “chunk” the list (or vector) of identifiers and work on individual chunks. With the help of the itertools package, this is very easy:

n <- 1:11
chunk.size <- 3
it <- ihasNext(ichunk(n, chunk.size))
while (itertools::hasNext(it)) {
  achunk <- unlist(nextElem(it))

Written by Rajarshi Guha

July 5th, 2012 at 2:22 pm

Posted in software

Tagged with , ,