# A Report from a Stranger in a Strange Land

I just got back from ACoP7, the yearly meeting of the International Society of Pharmacometrics (ISoP). Now, I don’t do any PK/PD modeling (hence the “strange land”) but was invited to talk about our high throughput screening platform for drug combinations. I also hoped to learn a little more about this field as well as get an idea of the state of quantitative systems pharmacology (QSP). This post is a short summary of some aspects of the meeting and the PK/PD field that caught my eye, especially as an outsider to the field (hence the “stranger”).

The practice of PK/PD is clearly quite a bit downstream in the drug development pipeline from where I work, though it can be beneficial to keep PK/PD aspects in mind even at the lead discovery/optimization stages. However I did come across a number of talks and posters that were attempting to bridge pre-clinical and clinical stages (and in some cases, even making use of in vitro) data. As a result the types of problems being considered were interesting and varied – ranging from models of feeding to predict weight loss/gain in neonates to analyzing drug exposure using mechanistic models.

A lot of PK/PD problems are addressed using model based methods, as opposed to machine learning methods (see Breiman, 2001). I have some familiarity with the types of statistics used, but in practice much of my work is better suited for machine learning approaches. However, I did come across nice examples of some methodologies that may be useful in QSAR type settings – including mixed effect models, IRT models and Bayesian methods. It was also nice to see a lot of people using R (ISoP even runs a Shiny server for members’ applications) and companies providing R solutions (e.g., Metrum, Mango) and came across a nice poster (Justin Penzenstadler, UMBC) comparing various R packages for NLME modeling. I also came across Stan, which seems like a good way to get into Bayesian modeling. Certainly worth exploring nore.

The data used in a lot of PK/PD problems is also qualitatively (and quantitatively) different from my world of HTS and virtual screening. Datasets tend to be smaller and noiser, which are challenging to model (hence less focus on purely data driven, distribution-free M/L methods). A number of presentations showed results with quite wide CI’s and significant variance in the observed properties. At the same time, models tend to be smaller in terms of features, which are usually driven by the disease state or the biology being modeled. This is in contrast to the 1000’s of descriptors we deal with in QSAR. However, even with smaller feature sets I got the impression that feature selection (aka covariate selection) is a challenge.

Finally, I was interested in learning more about QSP. Having followed this topic on and off (my initiation was this white paper), I wasn’t really up to date and was a bit confused between QSP and phsyiologically based PK (PBPK) models, and hoped this meeting would clarify things a bit. Some of the key points I was able to garner

• QSP models could be used to model PK/PD but don’t have to. This seems to be the key distinction between QSP and PBPK approaches
• Building a comprehensive model from scratch is daunting, and speaking to a number of presenters, it turns out many tend to reuse published models and tweak them for their specific system. (this also leads one to ask what is “useful”?)
• Some models can be very complex – 100’s of ODE‘s and there were posters that went with such large models but also some that went with smaller simplified models. It seems that one can ask “How big a model should you go for to get accurate results?” as well as “How small a model can you get away with to get accurate results?“. Model reduction/compression seems to be an actively addressed topic
• One of the biggest challenges for QSP models is the parametrization – which appears to be a mix of literature hunting, guesswork and some experiment. Examples where the researcher used genomic or proteomics data (e.g. Jaehee Shim, Mount Sinai) were more familiar to me, but nonetheless, daunting to someone who would like to use some of this work, but is not an expert in the field (or a grad student who doesn’t sleep). PK/PD models tend to require fewer parameters, though PBPK models are more closer to QSP approaches in terms of their parameter space.
• Where does one find models and parameters in reusable (aka machine readable) formats? This is an open problem and approaches such as DDMoRE are addressing this with a repository and annotation specifications.
• Much of QSP modeling is done in Matlab (and many published models are in the form of Matlab code, rather than a more general/abstract model specification). I didn’t really see alternative approaches (e.g., agent based models) to QSP models beyond the ODE approach.
• ISoP has a QSP SIG which looks like an interesting place to hang out. They’ve put out some papers that clarify aspects of QSP (e.g., a QSP workflow) and lay out a roadmap for future activities.

So, QSP is very attractive since it has the promise of supporting mechanistic understanding of drug effects but also allowing one to capture emergent effects. However, it appears to be very problem & condition specific and it’s not clear to me how detailed I’d need to get to reach an informative model. It’s certainly not something I can pull off-the-shelf and include in my projects. But definitely worth tracking and exploring more.

Overall, it was a nice experience and quite interesting to see the current state of the art in PK/PD/QSP and learn about the challenges and successes that people are having in this area. (Also, ISoP really should make abstracts publicly linkable).

# From Algorithmic Fairness to QSAR Models

The topic of algorithmic fairness has started recieving a lot of attention due to the ability of predictive models to make decisions that might discriminate against certain classes of people. The reasons for this include biased training data, correlated descriptors, black box modeling methods or a combination of all three. Research into algorithmic fairness attempts to identify these causes (whether in the data or the methods used to analyze them) and alleviate the problem. See here, here and here for some interesting discussions.

Thus I recently came across a paper from Adler et al on the topic of algorithmic fairness. Fundamentally the authors were looking at descriptor influence in binary classification models. Importantly, they treat the models as black boxes and quantify the sensitivity of the model to feature subsets without retraining the model. Clearly, this could be useful in analyzing QSAR models, where we are interested in the effect of individual descriptors on the predictive ability of the models. While there has been work on characterizing descriptor importance, all of them involve retraining the model with scrambled or randomized descriptors.

The core of Adler et al is their statement that

the information content of a feature can be estimated by trying to predict it from the remaining features.

Fundamentally, what they appear to be quantifying is the extent of multivariate correlations between subsets of features. They propose a method to “obscure the influence of a feature on an outcome” and using this, measure the difference in model prediction accuracy between the test set using the obscured variable and the original (i.e., unobscured) test set. Doing this for each feature in the dataset lets them rank the features. A key step of the process is to obscure individual features, which they term ε-obscurity. The paper presents the algorithms and also links to an implementation.

The authors test their approach on several datasets, including a QSAR-type dataset from the Dark Reactions Project. It would be interesting to compare this method, on other QSAR datasets, with simpler methods such as descriptor scrambling or resampling (from the same distribution as the descriptor) since these methods could be easily adapted to the black box assumption used by the authors.

Furthermore, given that their motivation appears to be driven by capturing multivariate correlation, one could take a feature $$X_i$$ and regress all the other features $$X_j\ (j \neq i)$$ on it. Repeating this for all $$X_i$$ would then allow us to rank the features in terms of the RMSE of the individual regressions. Features with low RMSE would represent those that are succesfully estimated from the remaining features. This would test for (possibly non-linear) correlations within the dataset itself (which is conceptually similar to previous work from these authors) but not say anything about the model itself having learnt any such correlations. (Obviously, this works for numerical features only – but that is usually the case for QSAR models).

Finally, a question that seemed to be unanswered in the paper was, what does one do when one identifies a feature that is important (or, that can be predicted from the other features)? In the context of algorithmic fairness, such a feature could lead to discriminatory outcomes (e.g., zipcode as a proxy for race). What does one do in such a case?

# Database Licensing & Sustainability

Update (07/28/16): DrugBank/OMx have updated the licensing conditions for DrugBank data in response to concerns raised earlier by various people and groups. See here for a detailed response from Craig Knox

A few days back I came across, via my Twitter network, the news that DrugBank had changed their licensing policy to CC BY-SA-NC 4.0. As such this is not a remarkable change (though one could argue about the NC clause, since as John Overington points out the distinction between commercial and non-commercial usage can be murky). However, on top of this license, the EULA listed a number of more restrictive conditions on reuse of the data. See this thread on ThinkLab for a more detailed discussion and breakdown.

This led to discussion amongst a variety of people regarding the sustainability of data resources. In this case while DrugBank was (and is) funded by federal grants, these are not guaranteed in perpetuity. And thus DrugBank, and indeed any resource, needs to have a plan to sustain itself. Charging for commercial access is one such approach. While it can be  problematic for reuse and other Open projects, one cannot fault the developers if they choose a path that enables them to continue to build upon their work.

Interestingly, the Guide to Pharmacology resource posted a response to the DrugBank license change, in which they don’t comment on the DrugBank decision but do point out that

The British Pharmacological Society (BPS) has committed support for GtoPdb until 2020 and the Wellcome Trust support for GtoImmuPdb until 2018. Needless to say the management team (between, IUPHAR, BPS and the University of Edinburgh) are engaged in sustainability planning beyond those dates. We have also just applied for UK ELIXIR Node consideration.

So it’s nice to see that the resource is completely free of any onerous restrictions until 2020. I have no doubt that the management team will be working hard to secure funding beyond that date. But in case they don’t, will their licensing also change to support some form of commercialization? Certainly, other resources are going down that path. John Overington pointed to BioCyc switching to a subscription model

So the sustainability of data resources is an ongoing problem, and will become a bigger issue as the links between resources grows over time. Economic considerations would suggest that permanent funding of every database  cannot happen.

So clearly, some resources will win and some will lose, and the winners will not stay winners forever.

### Open source software & transferring leadership

However in contrast to databases, many Open Source software projects do continue development over pretty long time periods. Some of these projects receive public funding and also provide dual licensing options, allowing for income from industrial users.

However there are others which are not heavily funded, yet continue to develop. My favorite example is Jmol which has been in existence for more than 15 years and has remained completely Open Source. One of the key features of this project is that the leadership has passed from one individual to another over the years, starting I think with Dan Gezelter, then Bradley Smith, Egon Willighagen, Miguel Rojas and currently Bob Hanson.

Comparing Open software to Open databases is not fully correct. But this notion of leadership transition is something that could play a useful role in sustaining databases. Thus, if group X cannot raise funding for continued development, maybe group Y (that obviously benefits from the database) that has funding, could take over development and maintenance.

There are obvious reasons that this won’t work – maybe the expertise resides only in group X? I doubt this is really an issue, at least for non-niche databases. One could also argue that this approach is a sort of proto-crowdsourcing approach. While crowdsourcing did come up in the Twitter thread, I’m not convinced this is a scalable approach to sustainability. The “diffuse motivation” of a crowd is quite distinct from the “focused motivation” of a dedicated group. And on top of that, many databases are specialized and the relevant crowd is rather small.

One ultimate solution is that governments host databases in perpetuity. This raises a myriad issues. Does it imply storage and no development? Is this for all publicly funded databases? Or a subset? Who are the chosen ones? And of course, how long will the government pay for it? The NIH Commons, while not being designed for database persistence, is one such prototypical infrastructure that could start addressing these questions.

In conclusion, the issue of database sustainability is problematic and unsolved and the problem is only going to get worse. While unfortunate for Open science (and science in general) the commercialization of databases will always be a possibility. One hopes that in such cases, a balance will be struck between income and free (re)usage of these valuable resources.

# SLAS 2017: Let There Be Light: Informatics Approaches to Exploring the Dark Genome

I’m organizing a symposium at the 2017 SLAS meeting in Washington D.C in the Data Analysis and Informatics track. The topic of the symposium focuses on informatics approaches that shed light and explore the dark genome. The description is given below, and I hope you’ll consider submitting an abstract.

With efforts such as the NIH-funded Illuminating the Druggable Genome (IDG) program, there is great interest and a pressing need to understand the “dark genome” — the subset of genes that have little to no information about them in the literature or databases. This session will focus on current efforts by members of the IDG program and the community in general on developing informatics resources for data aggregation and integration, target prioritization and platform development. In addition, topics such as characterization of druggability and novel approaches to connecting heterogeneous datasets that allow us to shed light on the dark genome will be considered.

The deadline is Aug 8, 2016 and you can submit an abstract here.

# Differential Dose Response – Some More Exploration

This is a follow on to my previous post that described a recent paper where we explored a few ways to characterize the differential activity of small molecules in dose response screens. In this post I wanted to highlight some aspects of this type of analysis that didn’t make it into the final paper.

TL;DR there’s more to differential analysis of dose response data than thresholding and ranking.

### Comparing Model Fits

One approach to characterizing differential activity is to test whether the curve fit models (in our case 4-parameter Hill models) are indistinguishable or not. While traditionally, ANOVA could be used to test this, it assumes that the models being compared are nested. This is not the case when testing for effects of different treatments (i.e., same model, but different datasets). As a result we first considered the use of AIC – but even then, applying this to the same model built on different datasets is not really valid.

Another approach (described by Ritz et al) that we considered was to refit the curves for the two treatments simultaneously using replicates, and determines whether the ratio of the AC50’s (termed the Selectivity Index or SI) from the two models was different from 1.0. We can then test the hypothesis and determine whether the SI was statistically significant or not. The drawback is that it, ideally, requires that the curves differ only in potency. In practice this is rarely the case as effects such as toxicity might cause a shift the in the response at low concentrations, partial efficacy might cause incomplete curves at high concentrations and so on.

We examined this approach by fitting curves such that the top and bottom of the curves were constrained to be identical in both treatments and only the Hill slope and AC50 were allowed to vary.

After, appropriate correction, this identified molecules that exhibited p < 0.05 for the hypothesis that the SI was not 1.0. Independent and constrained curve fits for two compounds are shown alongside. While the constraint of equal top and bottom for both curves does lead to some differences compared to independent fits (especially from the point of view of efficacy), the current data suggests that the advantage of such a constraint (allowing robust inference on the statistical significance of SI) outweighs the disadvantages.

### Variance Stabilization

Finally, given the rich literature on differential analysis for genomic data, our initial hope was to simply apply methods from that domain to the current problems. However, variance stabilization becomes an issue when dealing with small molecule data. It is well known from gene expression experiments that that the variance in replicate measurements can be a function of the mean value of the replicates. If not taken into account, this can mislead a t-test into identifying a gene (or compound, in our case) as exhibiting non-differential behavior, when in fact it is differentially expressed (or active).

The figure below compares the standard deviation (SD) versus mean of each compound, for each parameter in the two treatments (HA22, an immunotoxin and PBS, the vehicle treatment). Overlaid on the scatter plot is a loess fit. In the lower panel, we see that in the PBS treatment there is minimal dependency of SD on the mean values, except for the case of log AC50. However, for the case of HA22 treatment, each parameter shows a distinct dependence of SD on the mean replicate value.

Many approaches have been designed to address this issue in genomic data (e.g., Huber et al, Durbin et al, Papana & Ishwaran). One of the drawbacks of most approaches is that they assume a distributional model for the errors (which in the case of the small molecule data would correspond to the true parameter value minus the calculated value) or a specific model for the mean-variance relationship. However, to our knowledge, there is no general solution to the problem of choosing an appropriate error distribution for small molecule activity (or curve parameter) data. A non-parametric approach described by Motakis et al employs the observed replicate data to stabilize the variance, avoiding any distributional assumptions. However, one requirement is that the mean-variance relationship be monotonic increasing. From the figure above we see that this is somewhat true for efficacy but does not hold, in a global sense, for the other parameters.

Overall, differential analysis of dose response data is somewhat of an open topic. While simple cases of pure potency or efficacy shifts can be easily analyzed, it can be challenging when all four curve fit parameters change. I’ve also highlighted some of the issues with applying methods devised for genomic data to small molecule data – solutions to these would enable the reuse of some powerful machinery.