Okay, in this lecture, we’re going to talk about tests of significance. Specifically we're going to look at two tests that analyze differences between two groups. In this lecture, we’re going to focus on both independent and dependent T-Tests.
Now, first in all these tests of significance, there will be both parametric and nonparametric tests. We are going to focus in this class on the parametric tests. If you ever need to use nonparametric, you can review the text on that and/or Google search it. You quickly kind of get up to speed. But in this class we’re just going to focus on the parametric family of tests.
Now the parametric tests have one significant advantage and that is that they’re more powerful. Remember from a previous unit, we talked about statistical power meaning the ability of a test to reject a false null. I think your book says that parametric tests are more sensitive. That is true too, but also in addition they are more powerful. So they have that main advantage.
Now with T-Tests, we’re looking at a difference between two groups and trying to determine whether that result, a difference is the result of chance, in this case sampling error or random error. Is this just a chance difference? Or is there something else going on?
So the T-Test, we have an independent variable that is categorical. We’re going to measure that on a nominal scale. We can only compare two groups. And our dependent variable is quantitative. Typically on a ratio or interval scale is pretty common. And we can only look at a single outcome measure. If there’s a second outcome measure, we have to run a second T-Test.
Now in the notation for this, our null hypothesis is H0 and H0 is going to be that the Mu of 1 equals the Mu of 2. So the population mean for one particular group is equivalent to the population mean of another group.
Another way we can conceptualize this or notate this is that the mean of one group subtracted from the mean of the other group equals 0. So if the two groups are equivalent, if they equal one another, if we subtract one from the other, we get 0. And then if one is much greater than the other, we will get a…let’s say if Mu1 is much greater than Mu2, we’ll get a positive difference ending up over on the right-hand side; and if we get Mu2 being much different than Mu1, the difference will be negative so we’ll end up on the negative side of the normal distribution. So because groups are normally distributed, the differences between the groups are normally distributed. So a T-Test, we can conceptualize it with a normal curve now as well, just like with the Z-scores, but now the numbers of the horizontal axis represents subtracting one form the other, and Mu will be 0 right there in the middle.
So again, we can have a two-tailed test of significance and that will just simply be notated as Mu1 will not equal Mu2. Or a one-tailed test where if Mu2 is much greater that will be a negative difference. Again, shown there the critical value on the left side of the normal distribution. Or Mu1 being significantly larger than Mu2 will get a positive difference and then we’ll see the critical value falling on the right side.
Now, there are two major types of T-Tests and we’re going to look at both of them today. One type is the dependent T-Test and the other type is the independent T-Test.
Now, the dependent T-Test is oftentimes is called the paired samples T-Test. This is where the same subject is tested twice. So they’re not independent of one another, it’s the same subject tested twice. Sometimes this is done and called a pre-test/post-test setup or a test/retest environment. So we’ll take a measurement, in healthcare we’ll often take a measurement from a subject, give them some type of medical intervention, and then test them again.
Now, there are several advantages and disadvantages to both types of tests. So I want to talk about a couple of the advantages for the dependent T-Test. The first one is that there is no subject-to-subject variance, which means the groups are equivalent at the beginning. A great example we can use is healthcare related, is in weight loss. If I have two different groups, they could be different to begin with so a group that has a high BMI is probably going to lose more weight than another group that maybe is made up of skinnier people because they have less weight to loss. So when you’re measuring the same people twice or the same groups twice there’s no subject-to-subject variance; it’s the identical group. So when there are other variables that could affect the difference, it’s a major advantage to choosing a dependent T-Test.
Now, another major advantage of the dependent T-Test is that extraneous variables are automatically controlled for. Again, if we want to continue with this weight-loss example, if we put people on two different types of diets, again, they’re not lab rats or hamsters. We can’t lock them in the cage or measure them a week or a month later. They’re humans, they’re going to go out and live their lives. So they may change their activity level, for example, and if one group has significantly more people that up their activity level, they’re going to probably result in a greater weight loss, but again if we measure the same group twice, they can go out and do different things and the groups are identical because they’re the same people.
Another advantage of a dependent T-Test is dependent T-Tests are considered more powerful than independent T-tests. We talked about what the definition of power is. For this many times we can use smaller samples. We can get away with a smaller sample size for a dependent T-Tests. So if it’s going to be hard to recruit subjects, you may want to consider a dependent T-Test design.
There is a major disadvantage to the dependent T-test and that is this idea of testing subjects twice. What can sometimes happen is that testing can actually act as practice. They can actually get better because they’re being tested multiple times. It’s called the Carry Effect or sometimes it’s called the Learning Effect. The second or the first intervention can act as a practice or the second score can actually improve or move in a way that’s predictable because they’ve had the intervention twice. So if you test someone a second time, their score may improve simply because they’ve seen the test before. They’ve had a change to practice.
So let’s look at an example of a dependent T-test. So if I have a sample of fifteen male college students and I’m going to look at alcohol behaviors and I’m going to put them in a one semester class that’s kind of a scared straight approach to alcohol. I’m going to try to really reduce their level of drinking by maybe having them interview people who had been damaged by a drunk driver or have had their lives negatively impacted by excessive drinking. So I’m going to try to get that drinking to go down. So I’m going to take this group of fifteen males and I’m going to administer an alcohol behaviors inventory before and after this semester-long course.
So here’s my null hypothesis. It’s going to be that the pre and the post-test scores on the alcohol behaviors inventory will not significantly differ. This is a two-tailed test of significant. So the way to notate that would be H0 and that is the Mu1 equals Mu2. So if the scores are identical, whatever those scores are, if you subtract one from the other, you’ll get 0 or Mu1 equals Mu2.
So this is an example of what the SPSS printout would look like. Variable 1 is the mean score for the pre-test and variable 2, the second line, is the mean score for the post-test. You see we went from a 6.1 to a 4.6. So that is a reduction in the mean score. And if you follow that across, you’ll see a standard…well, the sample sizes are fifteen in the pre-test; fifteen in the post-test, same subjects tested twice; standard deviations of those scores; and then the stranded error of the mean. Remember that’s the estimated standard deviation of the sampling distribution. That’s kind of our descriptive statistics, the two variables. Then, we can look at the actual results of the tests. So we see pair 1, that’s the pair, so variable 1 minus variable 2, we’ll see the mean difference of a 1.53; the standard deviation of that difference; and, again, the standard error and the mean of that difference; and then, the T-score; degrees of freedom—don’t worry, we’re not going to talk about degrees of freedom in this class. But we are going to look at the two-tailed significance and that’s .000, that’s less than .05. So our decision will be to reject the null hypothesis.
So let’s look at our conclusion: what do we do with the null hypothesis? We reject it. What type of error could we have committed? Well, incorrect rejection of a null would be a type 1 error. And then give a possible explanation given the weakness of the dependent T-Test, what’s a possible explanation? Well, there’s only one classic weakness for a dependent T-test and that’s the Carry Effect so there may be something with the idea that they were given the same survey twice, the same instrument twice. Their score may have gone down simply because they took the test twice. It could be that they kind of got the sense of, oh I know what you’re trying to do. This this test trying to say, trying to look at whether I’m more sensitive to this behavior and I’m going to go ahead and give you want you want.
Now, if you go back to that printout of the SPSS results, I skipped over a section that talked about a confidence interval. Remember, we talked about confidence intervals. I had you watch a video on confidence intervals a couple of weeks ago. Here’s kind of an interpretation of that. First, the mean difference is 1.53 and if we go out two standard deviations, we’ll get anywhere from a difference of .81 to a difference of 2.25. So we can interpret this by saying we’re 95% confident that the true difference lies between a difference of .81 and a difference of 2.25.
Now, dependent T-Tests are not always possible because sometimes the independent variable can’t be manipulated. I can’t pre-test someone, for example, as a male and post-test them as a female. So I can’t use a dependent T-test in that way. I can’t, if the independent variable is personality, I can’t pretest a group as an introvert and then test them again as an extrovert. Obesity: I could here, but again logistically it’d probably be pretty hard to pre-test a group as let’s say obese and then post-test them as normal weight. Income, pretty impossible to do, to pre-test someone as an upper-middle class person and posttest the same group as a low-income individual. It’d be pretty hard to do. Religion: you can’t pre-test someone as a Christian; post-test them as a Buddhist, for example. Education level: again it may be possible, but it would be a logistical nightmare to pre-test someone as a high school dropout and post-test someone as having a college education. So sometimes you have to test two different groups.
So the independent T-Test is where you compare the mean scores of two different, independent groups. And when you compare the mean scores of any two groups, it would be unlikely that you would get identical scores. You’re always going to likely get a difference. But what a statistical test, an independent T-Test can tell us is whether this difference is statistically significant.
Now, the classic disadvantages of the independent T-Test are really simple because they’re simply the inverse of the advantages of the dependent T-test. So now you get subject-to-subject variance; you will have different groups. Sometimes, you will get extraneous variables tangled up because you can’t control for everything, how these people behave while they’re being tested, while the study is going on. And then the third one, you have to use larger samples with the independent T-Test compared to the dependent T-Test because it is generally less powerful.
Now, the major advantage again pretty easy to remember because it’s just the flip of the disadvantage of the dependent T-Test. The major advantage of the independent T that there is no Carry Effect. So sometimes it is very important that this test be a novel experience that it’s never happened to them before. So if that’s important, if that’s relevant to the phenomenon you’re studying, you really should consider an independent T-Test. But what’s going to happen most times, it’s just going to be an independent variable that you can’t manipulate. You can’t give it to someone and then change it. It’s going to be one of those variables that we just talked about.
There is a major assumption now with independent T-Tests and that’s this concept of homogeneity of variances. And what that means is that it’s an assumption that the populations you’re drawing from of these two different groups have similar standard deviations, they have similar variances or similar spreads. Because you’re going to pool them together, you’re going to have one number represent both. The book talks about using two different tests, you do that when you hand calculate it. You can skip over that because in the real world, you’re not going to hand calculate a T-Test. You’re going to let SPSS do the work. So I’m going to explain how to interpret that in an SPSS printout.
So let’s look at another example. This is similar to the first example, but now we’re going to take fifteen men and fifteen women and we’re going to take the same alcohol behaviors inventory and we’re going to see if the mean scores differ by gender.
So our null hypothesis will be that test scores on this alcohol behaviors inventory will not significantly differ between men and women. So we can notate that as Muf equals Mum. So the mean score for females will be equivalent to the mean score of men.
So if we look at our results here, you’ll see just like in the dependent T-Test, we kind of have our group statistics, kind of our descriptive statistics first. We have male and female rows there and they’re fifteen males, fifteen females. You’ll see a mean of score. So we have a difference of 5.2 versus 5.4, and then standard deviation for men and women, and the standard error of the mean for men and women. Then we go down to our test and this is where we do our test for equality of variances, which I spoke about. And basically, what we’ve got to decide is whether to follow line one or line two. And so we’re going to take that kind of look at a pre-test here called Levene’s Test for Equality of Variances. If that’s significant, we have to follow line two, which says equal variance is not assumed. If it’s above .05, we can follow the first line, where equal variances are assumed. And in this case it’s above .05, it’s .355. So we ignore the second line, follow the top line and you’ll see our T-score, our significance is .769, the mean difference, the standard error of the difference, and then our 95% confidence interval. So in the case our significance is about .05, .769 It’s not really surprising when you look at the actual differences. There’s a mean difference between 5.2 and 5.4. They’re pretty close to begin with so just a few tenths of a point off. So it’s not surprising that in this case we will accept our null hypothesis.
Okay, let’s interpret these results. What do we do with the null? As I said before, we accept the null. What type of error could we have committed? When you incorrectly accept a null that should be rejected, that’s a type 2. So type 2 would be the possible error here. Given the weaknesses of the independent T-Test, what are some possible explanations for an error here? One is subject-to-subject variances. The groups could be different to begin with. So if past research suggests that men drink more than women and in this case we had equivalent groups, maybe we had a lot of men in the group or disproportionate number of men in the group that didn’t drink that much. Or inversely maybe we had a lot of women in the group, they were overrepresented, that drank a lot and that’s what led to getting statistical equivalent results. So maybe they were different to begin with that would be just be to subject-to-subject variance. Or maybe there were some extraneous variables that weren’t controlled for. So maybe the men in this group went out and did something as a group that changed their behavior or there was something they did in the meantime. It doesn’t seem like a likely scenario. We don’t know enough details about this study. So I would lean, just based on what we know, subject-to-subject variance is a likely explanation.
We have a confidence interval as well in this printout. We have a mean difference of negative .13 and then the confidence internal stretches from negative 1.06 to .79. S this is the mean difference. And an important number is contained within the number that really confirms that we should accept the null hypothesis, and that is 0 is contained within this confidence interval. So if you think about the mean difference as a result of starting one from the other. On one side, on the left side we’ve got the second score, women being significantly greater than men; and on the right side we have a positive number .75, we have men having a significantly higher score than women. So 95% of the time we have opposite phenomenon occurring that’s classic random behavior. That certainly suggests that we should accept the null hypothesis.