Welcome to week 3, our second lecture for the week. This lecture we are going to be talking primarily about z-scores.
Before we get to z-scores, I want to cover a quick term that we are going to use, and that is percentile. Many of you probably already know what percentiles are, but I just want to make sure we are all clear on that. Percentile gives you or describes a point in a dataset that puts it relative to other scores in the distribution, so more specifically it will tell you the percent that got lower than one particular score in that distribution, and you arrive at that with a real simple equation. You take the number of scores at or below that point of interest. You divide it by the total number of points, you divide it by N, then you take that result and multiply it by 100 and that will give you the percentile.
So, let’s look at an example. Let’s say we have a data set of 20, and you are interested in where you fall in that data set. You get a score of 80 on a particular exam and that puts you—you can see where you are in the blue, so 16 people got scores below you and three people got higher scores. So you take 16 divided by 20, so 16 got your score or lower and you divide it by the total, you divide it by N, then you multiply it by 100 and you get 80. So that tells you that 80% of the people got scores below you, or you did better than 80% on the test. Another way to say it is your score falls in the 80th percentile.
If you remember, in the previous week we talked about, or we calculated, the standard deviation. The standard deviation is actually a unit of measure that we can use to describe a data point within a distribution. So, the mean of the distribution has a standard deviation of 0. It's the center of the distribution, and then we can talk about how a particular point within a dataset, where it falls in these standardized units of measure—but let’s talk about that. This is really an additional property of a normal distribution. If we have a normal distribution and we go out one standard deviation from either side of the mean—so we go one standard deviation to the left and one standard deviation to the right, illustrated there right below the red line—that will contain 68% of the distribution. So 68% of all of the values in the dataset in a normal distribution will be contained within one standard deviation. If we go out two standard deviations, we will capture quite a bit but not all. We will capture 95% of the total number in the dataset. If we go out three standard deviations, we will capture almost all. We will capture 99.7% of the total values in that distribution.
Now you will see what I just said illustrated in this image here, again going out 68% is one standard deviation, two standard deviations 95%, and three standard deviations 99.7%. This is very common, used all the time, and we call it the 68-95-99 rule.
So now when we look at normal distributions we can consider, no matter what the distribution is, the area under the curve as in proportions of that particular area. In the first image there, that has an area of 1. Think of it as 100%. The total area under the curve is equal to 1. If we draw a line at the mean, the center of the distribution—so if it is a normal distribution, it is the mean, the median, and the mode found at the center—if we draw a line, that splits the normal curve perfectly in half with 50% of the area, or .5, on the left side and .5 on the right side. Then again, as I said in the last two slides, the 68-95-99 rule—as we move out one standard deviation, two standard deviations, and three standard deviations. We are going to start talking about particular distributions and the proportion of the area under that distribution. I will give you some examples here in a second.
We are going to change terminology here just for a second. In the past couple of slides, I have talked about moving out one standard deviation or two standard deviations. We’re actually now going to refer to that as a z-score, so a z-score is just the unit of measure in standard deviations. So the mean has a z-score of 0. It is found right in the center. So if I go out one standard deviation to the right, that is a z-score of one. If I go one standard deviation to the left of the mean, that is a z-score of −1. So we will start talking about this, instead of referring to it as how many standard deviations, we are going to call it a z-score, and we are going to refer to that number as a z-score.
Now we can calculate a z-score with a very simple formula. With a normal distribution, we can transform any score to a standardized score or unit of measure of standard deviation, or a z-score, and it is simply that particular value minus the mean, minus mu, over sigma.
Now we are going to work through a number of examples, about seven examples, to get you familiar with some z-score problems. For Example 1, we need to find the area under a normal standard curve between a z-score of 0 and a z-score of 2.34. To do that, we won’t even need the z-score formula. We're going to use our table at the back of the book. If you look in your appendix section, go to that z-score table.
If we go to that table, that table is the percent of the total area of the normal curve between the z-score and the mean. That is what our problem is asking us for. To do this, we use both the horizontal and the vertical axis. First we go down the vertical axis until we get to 2.3. That gets us close. Now we’ve got to move over to get the second decimal point. The first column would be 2.30 and the second column 2.31—we need to move over to 2.34, and I have it circled there. That gives us the area between the mean score, so a z of 0 and a z of 2.34. The area under that curve would be 49.04%, so 49.4% of the area is found between the mean and a z of 2.34.
Here you will see an illustration. I think it is always important to start with a picture. These are pretty simple, but they are going to get more complicated. I think it is always important that we start with a picture and indicate the area we are looking for.
Okay, let’s go to problem #2. In problem #2 we need to find the area under a standard normal curve between a z of 0 and a z of −1.75. Now it says use the symmetric property of the normal distribution and your table at the end of the text to find the area. What I mean by that is remember one of the properties of a normal distribution is that it's symmetric about the mean, so the left side is a mirror image of the right side. Our z table does not give negative z-scores, but we don’t need them. It is kind of redundant. We can just solve for the right side so we will treat it as 1.75 and we can find the area for −1.75.
If you go to your z chart at the end of the book and find the z, the value of the area under the curve for 1.75—so go down the horizontal column until you find 1.7 and then move across until you find .05—you should find the area is 0.4599, so 45.99, or we can round up and say 46% of the area is found between a z of 0 and a z of −1.75. We’ve actually solved for the right side, but again it is a mirror image and we can just flip that over for the left side.
Okay, let’s go to example #3. Example #3 changes things a little bit, but it is still pretty simple. We need to find the area under the curve to the right of a z of 1.11. We will use our table at the end of the book, and again find 1.1 and then move over to .01, but there is a problem. That gives us the area from the mean to z, so we need to do a little math to find the correct area to the right.
Again, the chart gives us the area under the curve from the mean or a z-score of 0 to our particular value. We have been asked to solve for the area to the right, so we just need to do some simple math, again, because remember the mean splits the area under the curve in half, 50% to the right and 50% to the left. We just need to subtract the value that was given to us in the book, which is .3665. It is the area under the curve from 0 to 1.11, so to find the area to the right of 1.11 we just subtract that value from .5 and that result, .1335, is the area under the curve to the right of a z of 1.11.
Okay, let’s go to example #4. Example #4 is find the area under a normal curve to the left of a z of −1.93. It's kind of the same example as example #2. We are going to now use the symmetric property of the normal distribution to find essentially the left tail of the curve, to the left of −1.93.
Again, we can treat it like it is on the right side. If we look on our z table, we will see that that area from a z of 0 to 1.93 is .4732, so we subtract that from .5 and we get the area of that tail to the right. Again, we are going to look at a negative z-score, but since the normal distribution is symmetric we can find the left tail by just flipping it over. So the answer would be .0268.
Now, example #5 is a little trickier but still very doable. We need to find the area between a z of 2 and a z of 2.47. Now we need to do a little more math, but you can go to the next slide or think it through yourself and see if you can figure it out.
Okay, for this we’ve got to do a little more math but it is still pretty simple. We first find the area under the larger segment, so that would be from the mean to 2.47, and that value is .4932. Then we take the smaller segment from the mean or z of 0 to a z of 2. That area is .4772; we just subtract the smaller from the larger, and that difference is the area between 2 and 2.47, so the area between the z of 2 and a z of 2.47 is .0160.
Now, example #6. We are going to change things a little bit but still pretty simple, if you visualize it especially. Now we’re going to find the area between a z of 1.68 and a z of −1.37. If you think you know it, you don’t need to move ahead—go ahead and try it, and then start back on the lecture and see if you got the right answer.
For this answer, it is just simple addition. We find the area from a z of 0 to z of 1.68. That is 0.4535. And then a z of 0 and a z of −1.37—that area is .4147. Add the two together; the combined area between z of −1.37 and a z of 1.68 is .8682, or 86.82% of the total area can be found between those two z-scores.
Now, those first six exercises were really warm-up exercises to get you used to how the z table works. These are the real story problems. They take a little bit of thought, but these are the ones you really need to focus on. These are going to be more like story problems. Let’s work through example #7, and here is the story. “Each month an American household generates an average of 28 pounds of newspaper for garbage or recycling,” so that is our mu. Our mean is 28 pounds. “Assume the variance is 4 pounds," and "Assume the amount generated is normally distributed,” so we have a normal distribution with a variance of 4 pounds. “If a household is selected at random, find the probability of it generating more than 30.2 pounds per month.” Let’s start with a picture, and basically we are looking for the area to the right of 30.2 pounds.
Now the first thing we need to do is to take that 30.2 pounds and convert it to a z-score. You remember the formula I showed you a couple of minutes ago. We need to do that, so let’s convert 30.2 pounds and stick it in that formula. Also, remember in that formula in the previous slide, the variance is 4. We don’t need the variance. Remember, variance is sigma squared. Sigma is in the denominator. Standard deviation is in the denominator, so to convert that we need to just take the square root of 4 to convert the variance to the standard deviation. You’ve got the rest, so you can solve for z.
If we convert our pounds of garbage, 30.2, to a z-value, you will get a z-value of 1.1, so a z-score of 1.1. Now we just need to find the value above 1.1. Remember, the probability of a household selected at random generating more than 30.2 pounds of garbage—so we are looking for the right tail. So we simply do math. We find the area under the curve is .3643. We subtract that from .5 and we get our area of .1357 or 13.57%.
This is kind of an image of what that is. Again, we are looking for a z to the right of 1.1, so we subtract .3643 from .5 and get .1357.
Ok, let's move to example #8. For this example, let’s use the same story problem, our story about garbage, so we will use the same value for mu, our same value for sigma. But now we want to find the probability of drawing a household at random and having them fall between 27 and 31 pounds of garbage per month.
For this, we are going to skip right to the visual representation for this story problem. We need to first convert the z-score, so take those two values and convert it to a z-score. You will find the 1 will convert to a −.5 and the other will convert to a 1.5, so now we simply need to find the area between a z of −0.5 and a z of 1.5. That is just some simple addition. You find those areas, add then together, and you should get .6247 or 62.47%.
Now, we’ve got an upcoming z-score quiz that I’m sure you want to do some practice on, so if you go back online, to this week, you will see I've got some practice problems. You can take a practice quiz. If you get the answer wrong, you can see me working through the answer. That should give you some good practice, and then we’ve got an upcoming z-score quiz and the practice problems should provide you some excellent preparation for that. You can also, if you want, search online under "z-score quiz" and you can find a number of other online courses and other universities or community colleges that do online practice problems if you need some additional help. The textbook gives some practice problems, as well.