The normal distribution is a theoretical concept of how large samples of ratio or interval level data will look once plotted. Since many variables tend to have approximately normal distributions, it is one of the most important concepts in statistics. The normal curve allows for probabilities to be calculated. In addition, many inferential statistics require that data are distributed normally. If your data is not normal, be careful what statistical tests you use with it. In a normal distribution, measures of central tendency, including the mean, median and mode, all fall at the same midline point. The mean, median and mode are all equal. The calculation of these measures of central tendency are covered in another video.
Normal distributions share several key features. They are unimodal, meaning that there is only one peak in the distribution. When divided at the mean, a normal distribution takes the form of a symmetrical bell shaped curve. Standard deviations are used to measure how much variation exists in a distribution. Low standard deviations mean values are close to the mean, whereas high standard deviations mean that values are spread out over a large range. In a normal distribution, approximately 34% of scores fall between the mean and one standard deviation above the mean. Therefore, based on its symmetry, approximately 68% of scores fall between one standard deviation above and one standard deviation below the mean. Approximately 95% of scores fall between two standard deviations above and two standard deviations below the mean.
Approximately 99.7% of scores fall between three standard to deviations above and below the mean. Z-scores are used to measure how many standard deviations above or below the mean a particular score is. These scores allow for comparison and probability calculations. Not all samples approximate a normal curve. To understand more about distributions, it is important to understand modality, symmetry and peakedness. A distribution can have more than one peak. The number of peaks contained in a distribution determines the modality of the distribution. Most distributions are normally distributed and have only one main peak, meaning they are unimodal. However, it is possible to have distributions with two or more peaks.
Distributions with two peaks are bimodal. Distributions with more than two peaks are multimodal. Symmetry and modality are independent concepts. If two halves of a distribution can be superimposed on each other, where each half is a mirror image of the other, the distribution is said to be symmetrical. Sometimes data are not symmetrical. If the peak is off-center, one tail of the distribution will be longer than the other, meaning it is skewed. Skewness is a measure of symmetry of distributions. Pearson’s skewness coefficient provides a non-algebraic quick estimation of symmetry. Recall that normal distributions are symmetrical and bell shaped. In a perfect distribution, the skewness coefficient will be equal to zero because the mean equals the median. Positive skewness means that there’s a pile up of data to the left, leaving the tail pointing to the right side of the distribution.
The tail has been pulled in the positive direction. The data is skewed to the right. In this case, the mean is to the right of the median. Interestingly, positive skews are more common than negative ones. Negative skewness means that there’s a pileup of data to the right with a long tail on the left side. The tail has been pulled in the negative direction. In this case, the mean is to the left of the median. To remember the meaning of a positive and negative skew, think of pulling on tails. Remember that the tail points towards the direction of the skew. The mean is also pulled in the direction of the long tail of the skew. Kurtosis is a measure of the shape of the curve. It measures if the bell of the curve is normal, flat, or peaked. Since its calculation is tedious, it is typically done by a computer. Using Fisher’s measure of kurtosis, a normal distribution would receive a coefficient of zero and be called mesokurtic.
If the calculation of excess kurtosis results in a large positive number, the distribution is too peak to be considered normal. This type of data is called leptokurtic. The curve is taller and skinnier than a normal distribution. The beginning of the word sounds like leapt, so think of a skinny guy who leapt high in the air. If the calculation of excess kurtosis results in a negative number, it is too flat to be normal. It would be called platykurtic. The curve is shorter and fatter than a normal distribution. One way to remember this is at the beginning of the word, sounds like a flat plateau. If a distribution is skewed, there’s no need to calculate kurtosis since the distribution is already not normal. Thank you for watching. Please subscribe and explore more of my videos. Let me know what you found helpful and what other information you may need. I look forward to reading your comments.