## Archive for the ‘hypothesis’ tag

## Ridit Analysis

While preparing material for an R workshop I ran at NCATS I was pointed to the topic of ridit analysis. Since I hadn’t heard of this technique before I decided to look into it and investigate how R could be used for such an analysis (and yes, there is a package for it).

## Why ridit analysis?

First, lets consider why one might consider ridit analysis. In many scenarios one might have data that is categorical, but the categories are ordered. This type of data is termed ordinal (sometimes also called “nominal with order”). An example might be a trial of an analgesics ability to reduce pain, whose outcome could be *no pain*, *some pain*, *extreme pain*. While there are three categories, it’s clear that there is an ordering to them. Analysis of such data usually makes use of methods devised for categorical data – but such methods will not make use of the information contained within the ordering of the categories. Alternatively, one might numerically code the groups using *1*, *2* and *3* and then apply methods devised for continuous or discrete variables. This is not appropriate since one can change the results by simply changing the category coding.

Ridit analysis essentially transforms ordinal data to a probability scale (one could call it a *virtual continuous scale*). The term actually stands for * relative to an identified distribution integral transformation* and is analoguous to probit or logit. (Importantly, ridit analysis is closely related to the Wilcoxon rank sum test. As shown by Selvin, the Wilcoxon test statistic and the mean ridit are directly related).

## Definitions

Essentially, one must have at least two groups, one of which is selected as the reference group. Then for the non-reference group, the mean ridit is an

estimate of the probability that a random individual from that group will have a value on the underlying (virtual) continuous scale greater than or equal to the value for a random individual from the reference group.

So if larger values of the underlying scale imply a worse condition, then the mean ridit is the probability estimate that the random individual from the group is worse of than a random individual from the reference group (based on the interpretation from Bross). Based on the definition of a ridit (see here or here), one can compute confidence intervals (CI) or test the hypothesis that different groups have equal mean ridits. Lets see how we can do that using R

## Mechanics of ridit analysis

Consider a dataset taken from Donaldson, (Eur. J. Pain, 1988) which looked at the effect of high and low levels of radiation treatment on trials participants’s sleep. The numbers are counts of patients:

1 2 3 4 5 6 7 8 9 10 11 12 | sleep <- data.frame(pain.level=factor(c('Slept all night with no pain', 'Slept all night with some pain', 'Woke with pain - medication provided relief', 'Woke with pain - medication provided no relief', 'Awake most or all of night with pain'), levels=c('Slept all night with no pain', 'Slept all night with some pain', 'Woke with pain - medication provided relief', 'Woke with pain - medication provided no relief', 'Awake most or all of night with pain')), low.dose=c(3, 10, 6, 2, 1), high.dose=c(6,10,2,0,0)) |

Here the groups are in the columns (*low.dose* and *high.dose*) and the categories are ordered such tat *Awake most or all of night with pain* is the “maximum” category. To compute the mean ridits for each dose group we first reorder the table and then convert the counts to proportions and then compute ridits for each category (i.e., row).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## reorder table sleep <- sleep[ length(levels(sleep$pain.level)):1, ] ## compute proportions sleep$low.dose.prop <- sleep$low.dose / sum(sleep$low.dose) sleep$high.dose.prop <- sleep$high.dose / sum(sleep$high.dose) ## compute riddit ridit <- function(props) { ## props should be in order of levels (highest to lowest) r <- rep(-1, length(props)) for (i in 1:length(props)) { if (i == length(props)) vals <- 0 else vals <- props[(i+1):length(props)] r[i] <- sum(vals) + 0.5*props[i] } return(r) } sleep$low.dose.ridit <- ridit(sleep$low.dose.prop) sleep$high.dose.ridit <- ridit(sleep$high.dose.prop) |

The resultant table is below

1 2 3 4 5 6 | pain.level low.dose high.dose low.dose.prop high.dose.prop low.dose.ridit high.dose.ridit 5 Awake most or all of night with pain 1 0 0.04545455 0.0000000 0.97727273 1.0000000 4 Woke with pain - medication provided no relief 2 0 0.09090909 0.0000000 0.90909091 1.0000000 3 Woke with pain - medication provided relief 6 2 0.27272727 0.1111111 0.72727273 0.9444444 2 Slept all night with some pain 10 10 0.45454545 0.5555556 0.36363636 0.6111111 1 Slept all night with no pain 3 6 0.13636364 0.3333333 0.06818182 0.1666667 |

The last two columns represent the ridit values for each category and can be interpreted as

a probability estimate that an individuals value on the underlying continuous scale is less than or equal to the midpoint of the corresponding interval

The next step (and main point of the analysis) is to compute the mean ridit for a group (essentially the sum of the category proportions for that group weighted by the category ridits in the reference group) , based on a reference. In this case, lets assume the low dose group is the reference.

1 | mean.r.high <- sum(with(sleep, high.dose.prop * low.dose.ridit)) |

which is 0.305, and can be interpreted as the probability that a patient receiving the high dose of radiation will experience more sleep interference than a patient in the low dose group. Importantly, since ridits are estimates of probabilities, the complementary ridit (i.e., using the high dose group as reference) comes out to 0.694 and is the probability that a patient in the low radiation dose group will experience more sleep interferance than a patient in the high dose group.

## Statistics on ridits

There are a number of ways to compute CI’s on mean ridits or else test the hypothesis that the mean ridits differ between \(k\) groups. Donaldsons method for CI calculation appears to be restricted to two groups. In contrast, Fleiss et al suggest an alternate method based on. Considering the latter, the CI for a group vs the reference group is given by

\(\overline{r}_i \pm B \frac{\sqrt{n_s +n_i}}{2\sqrt{3 n_s n_i}}\)

where \(\overline{r}_i\) is the mean ridit for the \(i\)’th group, \(n_s\) and \(n_i\) are the sizes of the reference and query groups, respectively and \(B\) is the multiple testing corrected standard error. If one uses the Bonferroni correction, it would be \(1.96 \times 1\) since there is only two groups being compared (and so 1 comparison). Thus the CI for the mean ridit for the low dose group, using the high dose as reference is given by

\(0.694 \pm 1.96 \frac{\sqrt{18 + 22}}{2\sqrt{3 \times 18 \times 22}}\)

which is 0.515 to 0.873. Given that the interval does not include 0.5, we can conclude that there is a statistically significant difference (\(\alpha = 0.05\)) in the mean ridits between the two groups. For the case of multiple groups, the CI for any group vs any other group (i.e., not considering the reference group) is given by

\( (\overline{r}_i – \overline{r}_j + 0.5) \pm B \frac{\sqrt{n_i + n_j}}{2\sqrt{3 n_i n_j}}\)

Fleiss et al also describes how one can test the hypothesis that the mean ridits across all groups (including the reference) are equal using a \(\chi^2\) statistic. In addition, they also describe how one can perform the same test between any group and the reference group.

## R Implementation

I’ve implemented a function that computes mean ridits and their 95% confidence interval (which can be changed). It expects that the data is provided as counts for each category and that the input data.frame is ordered in descending order of the categories. You need to specify the variable representing the categories and the reference variable. As an example of its usage, we use the dataset from Fleiss et al which measured the degree of pain relief provided by different drugs after oral surgery. We perfrom a ridit analysis using aspirin as the reference group:

1 2 3 4 5 6 7 8 | dental <- data.frame(pain.relief = factor(c('Very good', 'Good', 'Fair', 'Poor', 'None'), levels=c('Very good', 'Good', 'Fair', 'Poor', 'None')), ibuprofen.low = c(61, 17, 10, 6, 0), ibuprofen.high = c(52, 25, 5, 3, 1), Placebo = c(32, 37, 10, 18, 0), Aspirin = c(47, 25, 11, 4, 1) ) ridit(dental, 'pain.relief', 'Aspirin') |

which gives us

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | $category.ridit pain.relief ibuprofen.low ibuprofen.high Placebo 1 Very good 0.67553191 0.697674419 0.83505155 2 Good 0.26063830 0.250000000 0.47938144 3 Fair 0.11702128 0.075581395 0.23711340 4 Poor 0.03191489 0.029069767 0.09278351 5 None 0.00000000 0.005813953 0.00000000 $mean.ridit ibuprofen.low ibuprofen.high Placebo 0.5490812 0.5455206 0.3839620 $ci group low high 1 ibuprofen.low 0.4361128 0.6620496 2 ibuprofen.high 0.4300396 0.6610016 3 Placebo 0.2718415 0.4960826 |

The results suggest that patients receiving either dose of ibuprofen will get better pain relief compared to aspirin. However, if you consider the CI’s it’s clear that they both contain 0.5 and thus there is no statistical difference in the mean ridits for these two doses, compared to aspirin. On the other hand, placebo definitely leads to less pain relief compared to aspirin.

## Correlating Continuous and Categorical Variables

At work, a colleague gave an interesting presentation on characterizing associations between continuous and categorical variables. I expect that I will be facing this issue in some upcoming work so was doing a little reading and made some notes for myself.

Given a continuous variable Y and a categorical variable G, is the distribution of Y independent of the levels of G? One can address this using parametric or non-parametric methods. Due to the superior power of parametric methods we start by considering them and only if we completely fail to satisfy the distributional assumptions of such methods, do we move to the non-parametric methods.

A simple approach is to convert G to an indicator variable matrix, such that each level of G becomes a binary vector, and then use this matrix and Y to fit a linear model. One can then perform an ANOVA on the fit to check whether Y is independent of the groups. This approach assumes that Y is normally distributed, and that variances are equal amongst the groups.

A simple example in R is as follows. We first create a dataset in which there is no difference in the mean or variance across the groups, one in which the means differ and one in which the variances differ

1 2 3 4 5 6 7 8 9 10 11 | G <- factor(sample(c('A','T','C','G'), 100, TRUE, c(0.2,0.2,0.2,0.2))) Y1 <- rnorm(100, mean=10, sd=3) Y2 <- Y1 Y2[which(G=='G')] <- sample(rnorm(length(which(G=='G')),mean=4,sd=3)) Y3 <- Y1 Y3[which(G=='G')] <- sample(rnorm(length(which(G=='G')),mean=10,sd=6)) |

A summary of the Y’s grouped by G are given below.

We now run the ANOVA, using the first dataset.

1 2 | fit <- lm(Y1 ~ G, data=data.frame(Y1,G)) anova(fit) |

Running this gives us

1 2 3 4 5 | Analysis of Variance Table Response: Y1 Df Sum Sq Mean Sq F value Pr(>F) G 3 0.74 0.25 0.0304 0.9928 Residuals 96 777.19 8.10 |

If we consider the 0.05 significance level, we would not reject the null hpothesis that Y is independent of G. If we now rerun the ANOVA using the second dataset

1 2 | fit <- lm(Y2 ~ G, data=data.frame(Y2,G)) anova(fit) |

we get

1 2 3 4 5 | Analysis of Variance Table Response: Y2 Df Sum Sq Mean Sq F value Pr(>F) G 3 1131.65 377.22 49.321 < 2.2e-16 *** Residuals 96 734.22 7.65 |

which suggests that we reject the null hypothesis and accept the alternative that there is the Y’s are not independent of the groups (which is true by design).

Furthermore, it really only indicates whether there is a dependence between the groups and the means of the group members – it doesn’t tell us anything about lack of independence in terms of scale parameters. This is not surprising since the linear model is defined to have a constant variance. Thus, if we want to investigate dependence of the variance on groups, we should look at distributions of residuals or other approaches to identifying heteroscedacity (such as Levenes test).

In case we cannot satisfy the assumptions (even after investigating possible transformations of the data) required for ANOVA, we can move to parametric methods, which are easy to run, but have a lower power. For this problem, the Kruskal-Wallis test and the Fligner-Killeen test come in handy. These methods are in the base **stats** package but also come with the coin package which is handy for running a variety of independence tests.

The Kruskal-Wallis test makes no assumptions about normality, but does assume that the distributions amongst the levels of G are of the same shape and have the same variance (and so, the differences are due to differences in the means). The Fligner-Killeen test also avoids the normality assumption bu in this case, has the null hypothesis that the variances in the groups are the same. Thus ideally, we’d run the Kruskal-Wallis first and if we could not reject the null hypothesis we then investigate whether there is a difference in the variances across the groups by the Fligner-Killeen tests.

Some examples highlight the analysis. First, we consider the dataset that is independent of the groups (Y1):

1 2 3 4 5 6 | > kruskal_test(Y1 ~ G, data=data.frame(Y1,G)) Asymptotic Kruskal-Wallis Test data: Y1 by G (A, C, G, T) chi-squared = 0.0628, df = 3, p-value = 0.9959 |

which indicates that we fail to reject the null hypothesis that the distribution of Y1 is independent of the levels of G. We then consider the dataset where the means are dependent on the groups

1 2 3 4 5 6 | > kruskal_test(Y2 ~ G, data=data.frame(Y2,G)) Asymptotic Kruskal-Wallis Test data: Y2 by G (A, C, G, T) chi-squared = 57.3197, df = 3, p-value = 2.196e-12 |

which indicates the we can reject the null hypothesis. However, note that for Y3, where the means are independent of the groups but the variance differ, the Kruskal-Wallis test is not useful:

1 2 3 4 5 6 | > kruskal_test(Y3 ~ G, data=data.frame(Y3,G)) Asymptotic Kruskal-Wallis Test data: Y3 by G (A, C, G, T) chi-squared = 1.5399, df = 3, p-value = 0.673 |

On the other hand, if we run the Fligner-Killeen test on these three datasets, we correctly identify Y3 as having its variance dependent on G

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | > fligner_test(Y1 ~ G, data=data.frame(Y1,G)) Asymptotic Fligner-Killeen Test data: Y1 by G (A, C, G, T) chi-squared = 2.9914, df = 3, p-value = 0.3929 > fligner_test(Y2 ~ G, data=data.frame(Y2,G)) Asymptotic Fligner-Killeen Test data: Y2 by G (A, C, G, T) chi-squared = 2.8202, df = 3, p-value = 0.4202 > fligner_test(Y3 ~ G, data=data.frame(Y3,G)) Asymptotic Fligner-Killeen Test data: Y3 by G (A, C, G, T) chi-squared = 15.3493, df = 3, p-value = 0.001541 |