Welcome everyone to week 3. In our first lecture this week we are going to talk about normal distributions. It’s a very important concept that we are going to carry with us throughout this class.
Now if you remember when we were talking about histograms I talked about these classic shapes that you can see, especially if you observe a lot of histograms you notice these shapes appearing over and over again. The one in the lower right corner that I have highlighted, the symmetric distribution, that is very prevalent, and we are going to change the name and refer to that as normal distributions. It normal distributions are essential in statistics and a very important, and they are important for a number of reasons so I want to outline those for you now.
One reason that normal distributions are important is that they are prevalent. You can literally see them almost everywhere. Here are just some examples that I have pulled out, but first physical performance is normally distributed so how long it takes a high school student to run a mile. If you’ve got that population data or even a large randomly drawn sample you would find that is normally distributed. How long elderly women can hold their breath—normally distributed. The range of motion scores in arthritic males—normally distributed. Other human features like the weight of the femur, white blood cell counts—normally distributed. Mental performance—any type of exams, the ACT score, the nursing licensure exam— normally distributed. Items found in nature—so the leaves on a tree, the number of kernels on an ear of corn, the height of an oak tree normally distributed. They are important because they are prevalent. We see them in the natural world, and by the way even in the man-made world if you know anybody involved in manufacturing—where I work in Peoria, Illinois, is the home of Caterpillar tractor, much of what they measure in their manufacturing process is normally distributed.
The second reason that normal distributions are important to statistics is that this assumption that you are drawing from a normally distributed population is fundamental to inferential statistics. It is an assumption that you will see repeated over and over again, and since it is so prevalent it is a fair assumption.
The final reason I want to talk to you about normal distributions being so important is that if you take multiple samples from a population and record their mean scores and make a distribution of those mean scores, that distribution will be normally distributed. It is such an important concept it is given its own term. We call that—when you create this distribution from multiple mean scores drawn from multiple samples we refer to that is a sampling distribution, so normal distributions are important to statistics because sampling distributions are normally distributed. We will visit that concept with the central limit theorem here in a week or two.
Now when we talk about normal distributions there is really a family of distributions. They all have some characteristics that they share that we are going to review here in a second, but their shapes aren’t identical. They have this general—we call it a Bell shaped. It is a curved shape. It somewhat resembles a bell, so sometimes they are called Bell curves. We are going to revert to them as normal distributions, but they can take on different shapes—some tall, thin and spiky; some short, fat and spread out, but they are all normal distributions and they all share some important characteristics that we are going to take a moment and review.
There are several important characteristics of normal distributions we need to review. I think your book refers to them as useful characteristics, so we can call it that. The first characteristic we’ve already talked about. It has that classic bell shaped, so it has that curved look to it. Again, it can be tall and thin or short and squat but it has that classic shape. The second characteristic is that the mean, the median and the load are all the same value. They are all equivalent to one another, and that value is found at the center of the distribution since it is the median and that center is also the peak of the distribution since it is the mode is so mean equals median equals mode. The third characteristic is that the curve is symmetric about the mean. It is basically a fancy way of saying that the right side mirrors the left side. Finally, the total area under the curve equals 1, and 1 here really a proportion. We are really meaning 100%—the total area we are going to say equals 1. That will become an important characteristic that we are going to build on in our next lecture.
The book talks about four useful characteristics. I want to add a fifth. The curve is theoretically continuous in both directions, so in theory mathematically it never touches the horizontal axis so it extends to infinity on both sides. Now impractical areas depending on what your unit of measure is, it may end. It may be impossible to get a score beyond a certain area, but mathematically that normal curve stretches to infinity on both the right side and the left side.