# Understanding the Statistical Analysis of Clinical Trial Results

**What are some of the statistical methods that are used to interpret clinical trial results?**

AIDS vaccine candidates are evaluated for safety, their ability to induce immune responses against HIV, and ultimately their efficacy in randomized, controlled, double-blind clinical trials (see *VAX* Oct.-Nov. 2007 *Primer* on*Understanding Randomized, Controlled Clinical Trials*). Biostatisticians, who specialize in statistical analysis, play an important role in how these trials are designed, as well as how the results are analyzed and interpreted.

For the first time, the recently completed RV144 trial in Thailand provided some indication that a combination of HIV vaccine candidates could provide some degree of efficacy (see * Spotlight *article). Although statistical analyses can be complex, understanding them is essential to proper interpretation of clinical trial results, including those from RV144.

**Trial size**

One statistical calculation that occurs before a trial begins is the sample size or the number of volunteers that need to be enrolled. Some of the volunteers enrolled receive the vaccine candidate(s), while others receive an inactive placebo. All volunteers in clinical trials receive risk-reduction counseling and available HIV prevention strategies, such as condoms, as a way to reduce their risk of infection. Still, some individuals in both vaccine and placebo groups will become HIV infected during the trial through natural exposure to the virus.

Having an accurate estimate of HIV incidence rates—the number of people who are newly infected with HIV per year—in the population that will be involved in the study is therefore useful in determining the trial size. If the overall incidence in the trial population is low, more volunteers are necessary. Biostatisticians determined that 16,000 volunteers would need to be enrolled in the RV144 trial because volunteers were recruited from the general population and not from specific populations known to be at an increased risk of HIV infection—such as injection-drug users or men who have sex with men.

Some trials are also designed to continue until a pre-determined number of HIV infections or endpoints occur. This doesn’t require having as precise an estimate of HIV incidence: if the HIV incidence is low, the duration of the trial is longer. The precision with which the efficacy of the vaccine is determined is based on the number of HIV infections that occur during the study, not the total number of volunteers involved.

**Efficacy and confidence intervals**

The key to determining the efficacy of a vaccine is comparing the number of HIV infections that occurred in the vaccine and placebo groups. If more infections occur in volunteers who received placebo, as was the case in RV144, researchers can then estimate the efficacy of the vaccine candidates. In RV144, 74 infections occurred among volunteers in the placebo group, while 51 occurred among those who received the full prime-boost regimen. Based on this result, biostatisticians estimated that the efficacy of the vaccine candidates was 31.2%, which means that the vaccine recipients had a 31% lower risk of HIV infection than those who received placebo.

But 31.2% is just the best estimate of the vaccine efficacy. Biostatisticians also calculate something known as a confidence interval, which is a range of values around the best estimate of efficacy, all of which are contenders for the actual efficacy of the vaccine. Confidence intervals provide some perspective about how precise the estimated efficacy is—the wider the confidence interval, the less certain researchers are of the actual efficacy of the vaccine candidates. Take RV144 for example. In the originally reported results from this trial the confidence interval ranged from 1.2% to 52.1%. The efficacy of the prime-boost regimen could be anywhere in that range, yet the most likely efficacy is at the middle of that range, or 31.2%. Part of the reason that there was such a wide confidence interval for RV144 was because there were relatively few HIV infections overall that occurred during the trial.

**Statistical significance**

If there is a difference between the number of HIV infections that occurred in the vaccine and placebo groups, researchers ultimately want to know if this is because the vaccine actually worked, or if it happened merely by chance. There are several calculations biostatisticians use to try to determine this. One commonly used calculation is a p-value, and although it doesn’t provide definitive information about whether the vaccine effect is real, it can provide evidence to suggest that the vaccine did have an effect. A p-value tells researchers how likely it would have been to get the result seen in the trial (74 infections in the placebo group and 51 in the vaccine group), or an even larger difference, if the vaccine had no effect. The less likely this is to occur, the lower the p-value, and the stronger the evidence is that the vaccine actually did have some effect.

Based on the 74-51 split in infections in RV144, statisticians calculated a p-value of 0.04. This means that if the vaccine had no effect whatsoever, there is a 4% chance that this split in infections, or an even larger one, would have occurred anyway. P-values are often misinterpreted. A p-value of 0.04 does not mean that there is only a 1 in 25 chance that the vaccine did not work at all, even though this is how it is commonly described.

It is a widely held convention to call any result with a p-value of less than 0.05 statistically significant. However, the 0.05 cut-off point was arbitrarily selected and so statisticians recommend not using this threshold as a hard and fast rule for judging whether the vaccine’s efficacy is real. This is particularly true if the p-value is just on the cusp of statistical significance, as is the case in RV144. For example, trials with p-values of 0.06 or 0.04 provide virtually indistinguishable levels of evidence for whether the vaccine efficacy is real, even though one is statistically significant and the other is not.