Analysis of Variance (ANOVA)

Let’s say you want to find out if the beverage that people drink affects their reaction time. So you set up an experiment with three groups of people. The first group gets water to drink. The second group gets some sugary fruit juice, and the third group gets coffee. Now you test everyone’s reaction time, and you want to know if there’s any difference in reaction time between the groups. The null hypothesis says that the mean reaction time for all three groups is the same. If there were only two groups, you could use a T test to find out if there’s a difference between them. But when you have three groups or more, you need to use a different approach: the analysis of variance.

When you do the experiment, the scores won’t all be the same. The total variation of all the scores is made up of two parts. The variation within each group, because the people in each group have different reaction times, and the variation between the groups, because the drinks you gave each group are different.

Here’s an example: Look at this set of scores. They’ve been sorted into order to make it easier to see the patterns. You can see that there’s a lot of variation within each group. Some people are faster and some are much slower. But all the groups look pretty much alike. There’s not much variation between the groups. In this case you’d say that most of the difference is due to the people, and the drink didn’t make much of a difference. You would accept the null hypothesis that the type of drink doesn’t have any effect on reaction time.

Now let’s look at a different set of numbers. In this case, all the scores within each group are very close to one another. There’s not a lot of variance within each group. But the groups are very different from one of another. There’s a lot of difference between the groups. In this case, you would reject the null hypothesis. In this case, the type of drink makes a big difference.

So here’s the idea behind analysis of variance: Figure out how much of the total variance comes from the between groups variance, and the within groups variance. Take the ratio of between groups to within groups of variance, and the larger this number is, the more likely it is that the means of the groups really are different and that you should reject the null hypothesis.

In the examples, it was obvious where the variance was. Now look at these numbers. You probably can’t tell if there’s a significant effect because it’s not clear whether there’s more variance within groups, or between groups, or how much. The calculations show that the ratio is 4.27, which has a probability of 0.04. So in this case, you can reject the null hypothesis. With these numbers, the drink you give the people does have an effect on their reaction time.

What’s that two comma 12 doing there? Those are the degrees of freedom for variance between groups and variance within groups. And here’s how you calculate the degrees of freedom when you report results for analysis of variance.

This trick of separating the variance not only works when you have three or more groups; it also works when you have multiple variables. For example, if you test three groups for reaction time in the morning, and you test another three groups in the evening, an analysis of variance can tell you if there’s a significant effect for the type of drink, or if the time of day makes a difference, or if there’s some interaction. For example, coffee might be more effective in the morning than in the evening.

So to recap, here’s the main idea of analysis of variance: You figure how much of the total variance comes from between the groups, and how much comes from within the groups. If most of the variation is between groups, there’s probably a significant effect. If most of the variation is within groups, there’s probably not a significant effect.