Okay, in this lecture we’re going to talk about one-way analysis of variance or ANOVA for short.
No dialogue
Now with ANOVA everything is similar to T-tests but with one small change. Now we can compare more than two groups. You can’t do that with T-tests. You can’t compare A to B, B to C, A to C. You can’t do that with a single research question. So if you’re looking at more than two groups, three, four, five, six, seven or more groups, you use an ANOVA.
Now with a simple ANOVA, the null hypothesis just looks like this, Mu1 equals Mu2 equals Mu3, all the way down to however many groups you are comparing. And the alternative is that at least one mean score is different from at least one of the other mean scores.
Okay, the ANOVA is conceptualized in kind of a different way. So I want to review that with you right now. If we have three groups in healthcare, a classic way to compare patient groups is by age group. So we can put people into one of three: a pediatric, adult, or geriatric age group. Now, if we look at whatever the dependent variable is, height, weight, whatever variable we’re looking at, we can look at the variations or the variance of that dependent variable measure in two classic ways.
This first is called within group variance or sometimes it’s referred to as error. That’s the variation of the differences of scores within the particular groups. So within the pediatric group, again whatever measure we’re using, there are probably high scores and low scores. Within the adult group, there’s variance. Within the geriatric group, again, variations exists. We’ll have high scores and low scores. That’s the variance we’re not interested in. So we’re going to refer to that as error.
Now what we’re interested in is the second type of variance and that is called the between group variance. That’s how geriatric, a particular score in the geriatric group differs from a particular score in the adult group, or pediatric versus adult, or however many comparisons you want to make. But that variance between those two groups that’s what we’re interested in.
So the F-statistic and the F-statistic is the coefficient we use for ANOVA. That coefficient is simply the ratio of between group variance over within group variance. So believe it or not, those differences are normally distributed. So if we put what we’re interested in in the numerator and the error in the denominator, if that numerator gets large enough that suggests that the between group differences are larger than what chance could explain so that we have a real difference. So again we put, the F-statistic puts the between group differences in the numerator and the within group differences in the denominator and looks at that particular ratio.
Now the F-test, if we have a significant F-test that just tells us that there is a difference. It doesn’t tell us where the difference lies. Again, to get a significant Fscore that just says that at least one group differs from at least one of the other groups. So we need to do a post hoc test that will tell us the specific nature of the difference, where the difference lies. And there are a whole family of post hoc tests. For this class we’re just going to use kid of the default post hoc test called the least squared differences, or the LSD, post hoc test.
So let’s look at an example. And I’m sorry this isn’t a healthcare example, but it’s a pretty simple one. If we look at ACT scores, that ACT test you take in high school, and compare it, those scores between three classic types of school, high schools: rural high schools, suburban high schools, and urban high schools. So in this case our null hypothesis is there is no significant difference between the mean ACT scores of students from rural, suburban and urban schools.
Okay let’s look at our SPSS printout. We see that ratio I talked about the F-ratio of the between group and the within group differences. And then we see the Fscore, but what we’re interested in is that significant score. We want it to be below .05. So we see that it is. It’s .008, that’s certainly well below .05. So we have a significant difference in ACT scores between the three types of schools, but we don’t know where that particular difference, where do the differences lie. So we need to perform a post hoc test to learn more.
So we’re going to run a least squared differences test and what that test does is it pulls out each variable and compares it to the remaining variable. So we have three variables so there’s going to be three comparisons, where in each comparison, each variable is pulled out and compared to the other two. SPSS makes it very easy for us. If there is a significant difference, it will put an asterisk next to the mean difference. So there’s some redundancies in the post hoc test. If A is significantly different than B, B is significantly different than A. So you have to kind of tease out those redundancies. What I do is I look at each of the comparisons and I see is there is a box where there’s two asterisks, and we see that does exist in the first box, rural. Rural is significantly different than both urban and suburban. I look at the other two boxes and I see that’s just the reverse, and so I know that the difference lies between rural and the other two. But I still don’t know where the difference is, or is the difference higher or lower. So I’m going to ask for a means plot and I can see visually where the differences lie.
So I ask for both a means plot and a list of descriptive statistics. So I see the rural, suburban, and urban numbers shown there graphically, and I see that rural is significantly lower than both urban and suburban. And I get those exact scores there: the rural ACT is a 19.66 mean score, and then suburban a 24, and urban a 25.
One small note. I don’t want to insult anybody that went to a rural high school. This is made up data. I’m not suggesting that rural high schools or poor or less intelligent students attend those schools. So this is just made up data.